Daily arXiv Scan - March 15, 2026
Today's 4-model arXiv scan processed 80 papers across AI categories, with unanimous agreement on one critical finding and strong consensus on four others. The emerging pattern: individual agent intelligence creates collective failure modes that require system-level governance solutions.
Perfect Consensus (All 4 Models)
Increasing intelligence in AI agents can worsen collective outcomes
Selected by: Kimi K2, Claude Opus 4.6, Gemini 2.5 Pro, GPT-5
- Kimi K2: Shows counterintuitive finding that smarter agents don't lead to better system outcomes. Game-theoretic exploitation creates arms races that collapse into "tribal chaos" with catastrophic resource misallocation. Suggests bounded rationality may outperform unbounded optimization.
- Opus: Treats AI-agent populations as first real-world system where four key variables governing collective behavior can be systematically varied. Maps directly onto near-term deployment scenarios where heterogeneous agents share infrastructure. Challenges assumption that smarter agents = better outcomes.
- Gemini: Sharp warning using Minority Game framework showing that greater individual "intelligence" leads to over-synchronized behavior, extreme volatility, and lower resource utilization. Frames AI risk as emergent property of competing agent populations, not just single misaligned superintelligence.
- GPT-5: Flips default industry intuition ("smarter = better") when agents compete for finite resources. Better predictors herd on optimal actions, increasing variance and conflict. Shows capability externalities are real—uncoordinated scaling pushes systems into worse equilibria.
Strong Consensus (3 Models)
Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems
Selected by: Kimi K2, Gemini 2.5 Pro, GPT-5
- Kimi K2: Models attack surface as "gadget graph" where traditional software vulnerabilities combine with AI-specific weaknesses. Demonstrates end-to-end attacks where bit-flips get amplified through LLM attention mechanisms.
- Gemini: Shows AI agents as orchestrators for multi-stage attacks exploiting vulnerabilities across entire software/hardware stack. Minor hardware flaws weaponized by high-level LLM agents that craft inputs to trigger and exploit effects.
- GPT-5: Demonstrates how LLMs with autonomous discretion create explosive search space for exploit chaining. Security postures evaluated component-wise consistently underestimate real risk.
Security Considerations for Artificial Intelligence Agents
Selected by: Kimi K2, Claude Opus 4.6, GPT-5
- Kimi K2: Perplexity's operational security insights showing how agentic architectures break traditional security assumptions. Reveals capability escalation chains and long-horizon attacks enabled by persistent memory.
- Opus: Grounded in operational experience running general-purpose agentic systems at scale (millions of users). Provides practitioner-derived threat taxonomy when execution paths become non-deterministic.
- GPT-5: First public operations-grounded threat model for large-scale AI agents. Reframes LLM integration pitfalls as system-level CIA failures introduced by agent coordination.
Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models
Selected by: Claude Opus 4.6, Gemini 2.5 Pro, GPT-5
- Opus: Tackles genuinely hard systems problem where different requests traverse different computational paths through model graph. Introduces flexible task abstraction for arbitrary multimodal computation graphs.
- Gemini: Treats any-to-any model as dynamic computation graph with modality-aware paths. Shifts system design toward subgraph-level admission control and latency-SLO routing.
- GPT-5: Provides task abstraction and scheduler treating model as dynamic computation graph. Enables heterogeneous batching and path-specific scaling for better GPU utilization.
IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL
Selected by: Claude Opus 4.6, Gemini 2.5 Pro, GPT-5
- Opus: Addresses poorly understood post-training RL scaling laws. Provides compute-optimal allocation framework for on-policy RL, preventing massive waste in RLHF/RLAIF spending.
- Gemini: Turns RL post-training from "black art" into engineering discipline. Discovers predictable trend for optimal parallel rollouts that increases with budget then saturates.
- GPT-5: Maps compute allocation for on-policy RL across three knobs: parallel rollouts, problems per batch, update steps. Identifies saturation regimes and iso-compute curves for principled budgeting.
Unique Picks
Claude Opus only:
- Normative Common Ground Replication (NormCoRe) — Methodological framework for studying norm emergence in multi-agent AI systems through "replication-by-translation"
Kimi K2 only:
- Decentralized Orchestration Architecture for Fluid Computing — Blockchain-based consensus for distributed AI workload orchestration with Byzantine fault tolerance
- LifeSim: Long-Horizon User Life Simulator — Simulates entire user lifetimes for evaluating AI assistant impact over realistic timescales
Gemini 2.5 Pro only:
- Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights — Reframes pretraining as creating "thicket" of solutions rather than single starting point
Connecting Threads
The Collective Intelligence Problem: Papers on agent populations and norm emergence reveal that individual optimization without coordination produces emergent adversarial dynamics. Governance interventions at the system level may matter more than capability benchmarks.
Infrastructure Shapes Behavior: Security and serving architecture papers demonstrate that deployment layer decisions create possibility spaces for agent behavior. System design determines both capabilities and failure modes.
From Alchemy to Engineering: Multiple papers push toward quantitative frameworks—RL scaling laws, security threat taxonomies, multimodal serving architectures—replacing intuition with principled design.
Compositionality as Double-Edged Sword: The same modularity enabling flexible workflows also enables attack chains. Safety properties emerge (or fail) at composition time, not component level.
Statistical Baseline
- 80 total papers processed
- 9 unique papers selected across all models
- 5 papers with 3+ model agreement (expected by chance: 0.07)
- 0 papers with exactly 2 model agreement (expected: 1.72)
- 71% consensus rate among agreed papers vs. 1% expected baseline
Recommended Reading Ranked by Agreement
- Universal (4/4): Increasing intelligence in AI agents can worsen collective outcomes — Critical for multi-agent system design
- Strong (3/4): Security Considerations for AI Agents — Operational security framework from Perplexity
- Strong (3/4): Cascade: Attack Gadgets in Compound AI — Cross-layer security threats
- Strong (3/4): Cornserve: Any-to-Any Multimodal Serving — Infrastructure for multimodal agents
- Strong (3/4): IsoCompute Playbook: RL Scaling Laws — Resource allocation for post-training
Methodology: Four frontier AI models (Kimi K2, Claude Opus 4.6, Gemini 2.5 Pro, GPT-5) independently selected 5 papers each from 80 candidates. Analysis focuses on papers with multi-model agreement as signals of broad research importance.