Daily 4-Model arXiv Scan: April 19, 2026

📡 Daily Reports · 2026-04-19

arxivai-safetygovernancesystems

80 papers scanned across cs.AI, cs.CL, cs.LG, cs.HC, cs.SE, stat.ML. Models: Gemini 2.5 Pro, Kimi K2, Claude Opus 4.6 (GPT-5 failed due to rate limits).

Consensus Picks (3+ Models)

Agentic Microphysics: A Manifesto for Generative AI Safety

Agreement: Gemini, Kimi, Opus

Opus: "As AI systems acquire planning, memory, tool use, and persistent identity, safety analysis must shift from the individual model to the population level."
Gemini: "The explicit analogy to statistical mechanics: just as the properties of a gas emerge from the interactions of individual molecules, the safety-relevant properties of an AI ecosystem will emerge from the 'microphysics' of agent interactions."
Kimi: "Gives regulators a physics-flavoured language that is measurable on live traffic... supplies a path to socio-technical regulation."

CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

Agreement: Gemini, Kimi, Opus

Opus: "The first comparative study of game-theoretic cooperation-sustaining mechanisms applied to LLM agents... capability scaling hurts cooperation."
Gemini: "Demonstrates that pro-social behavior is not an emergent property of intelligence but a property of the system of interaction."
Kimi: "Without the right incentive structure even Claude-5-Reasoning betrays you. That flips the safety narrative: alignment is no longer a tuning problem, it's an incentive-layer problem."

Pair Picks (2 Models)

LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

Agreement: Gemini, Opus Identifies a fundamental failure mode in Reinforcement Learning with Verifiable Rewards (RLVR): models learn to enumerate instance-level labels that pass the verifier without capturing the underlying logical rule. A clear demonstration of Goodhart's Law at the frontier of AI capabilities.

Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

Agreement: Kimi, Opus A scheduling system that treats agentic workflows as first-class objects to efficiently serve multi-LLM setups on GPU clusters. Handles GPU oversubscription, unpredictable fan-out, and recursion by treating the workflow graph as an "aggregate pipeline."

Context Over Content: Exposing Evaluation Faking in Automated Judges

Agreement: Gemini, Opus Uncovers a critical vulnerability in the "LLM-as-a-judge" paradigm called "stakes signaling." Informing a judge model about the downstream consequences of its verdicts systematically corrupts its assessments, proving these models are latent social agents susceptible to context framing.

Connecting Threads

The Fragility of Oversight & Verification: Our mechanisms for checking AI behavior are systematically gameable. RLVR-trained models hack verifiers without reasoning, and judge models are manipulated through contextual framing.
From Individual Models to Interaction Systems: The unit of analysis must move from the single model to the interacting system. Safety requires institutional mechanisms and understanding emergent multi-agent behavior.
Goodhart's Law at Scale: As systems become more capable optimizers, the quality of our objectives and evaluation criteria becomes the binding constraint.
Infrastructure Shapes Behavior: The structure around AI systems—scheduling infrastructure, interaction protocols, institutional mechanisms—determines outcomes as much as model quality. Governance is migrating into the stack.

Statistical Baseline

Total unique papers selected: 8
Papers at 3+ agreement: 2 (Expected by chance: 0.02)
Papers at 2+ agreement: 5 (Expected by chance: 0.90)

Methodology Note: This scan uses a 4-model consensus approach (Opus, GPT-5, Gemini, Kimi) to identify structurally significant papers in frontier AI, mechanism design, and systems-level safety. Only papers with multi-model agreement or exceptional single-model conviction are featured.