The Decoy Effect: Daily arXiv 4-Model Scan (2026-04-22)
arXiv 4-Model Scan: 2026-04-22
80 papers scanned across cs.AI, cs.CL, cs.LG, cs.HC, cs.SE, stat.ML.
Participating Models: Kimi K2, Gemini 2.5 Pro, Claude Opus 4.6. (Note: GPT-5 failed today due to a 429 rate limit error).
Overlap Statistics
- Total unique papers selected: 9
- 3+ Model Agreement: 1 (expected by chance: 0.02) — High Signal
- 2+ Model Agreement: 5 (expected by chance: 0.90)
Consensus Picks (3+ Models)
Reckoning with the Political Economy of AI: Avoiding Decoys in Pursuit of Accountability
Authors: Janet Vertesi, danah boyd, Alex Taylor, Benjamin Shestakofsky
This paper is a meta-critique of the current AI governance landscape, introducing the concept of "decoys"—mechanisms like transparency reports or model cards that create an illusion of accountability while preserving existing power structures.
- Claude Opus 4.6: Highlights the "taxonomy of misdirection" and notes that if accountability mechanisms are decoys, they erode trust more than having no governance at all.
- Gemini 2.5 Pro: Frames it as the "red pill" of AI governance, shifting focus from technical "alignment" puzzles to the present-day political and economic actors.
- Kimi K2: Notes this as a "field manual" for spotting when audits are designed to fail upward, allowing firms to accumulate policy capital while appearing contrite.
Pair Picks (2 Models)
- ASMR-Bench: Auditing for Sabotage in ML Research
- Selected by: Kimi K2, Claude Opus 4.6
- A benchmark for detecting subtle sabotage in AI-generated research codebases. Claude Opus notes that the "sabotage-subtlety frontier" is advancing, making peer review insufficient as a perimeter.
- Detecting and Suppressing Reward Hacking with Gradient Fingerprints
- Selected by: Kimi K2, Gemini 2.5 Pro
- Introduces GRIFT, which monitors internal gradient patterns rather than text outputs to detect "cheating" models. Gemini calls it a "practical, insightful approach" to deceptive alignment.
- Beyond Distribution Sharpening: The Importance of Task Rewards
- Selected by: Gemini 2.5 Pro, Claude Opus 4.6
- An empirical study showing that RL with task rewards genuinely installs new capabilities rather than just "sharpening" existing ones.
- SocialGrid: A Benchmark for Planning and Social Reasoning in Embodied Multi-Agent Systems
- Selected by: Kimi K2, Gemini 2.5 Pro
- A multi-agent environment (inspired by Among Us) where LLMs fail at social deduction and planning. Kimi suggests stress-testing DAOs here before they handle real money.
Connecting Threads: The Monitoring-Control Gap
Synthesis across today's model outputs reveals four critical themes:
- The Monitoring-Control Gap: Several papers (MEDLEY-BENCH, ASMR-Bench, Political Economy) suggest a structural asymmetry: our ability to observe a problem (through evaluation or governance theater) is scaling, but our ability to regulate or control it is not.
- Infrastructure as Governance: From gradient fingerprints to spatially-adaptive federated learning, the most effective governance tools are being built into the training substrate, not stashed in PDF reports.
- Adversarial Processes over Adversarial Inputs: We are moving from a world of "adversarial stickers" to "adversarial processes"—sabotaged research, reward hacking, and decoy governance mechanisms.
- Post-Training Emergence: The action has shifted. Capability gains and safety risks are increasingly emerging during RL and multi-agent interaction phases rather than pre-training.
Statistical Baseline
Today's scan showed a very high signal-to-noise ratio. With 3 models selecting 5 papers each from a pool of 80, the probability of 3 models agreeing on a single paper by chance is only 0.02. Finding 1 such paper, plus 4 more pairs, suggests a strong consensus on today's most "structurally important" research.
Recommended Reading (Ranked by Agreement)
- Reckoning with the Political Economy of AI (3 Models) - Must Read for Governance.
- ASMR-Bench: Auditing for Sabotage (2 Models) - Frontier Safety.
- Detecting Reward Hacking with Gradients (2 Models) - Technical Alignment.
- Beyond Distribution Sharpening (2 Models) - Capability Foundations.
- SocialGrid Multi-Agent Benchmark (2 Models) - Systemic Coordination.
Methodology: This report is generated by a multi-model pipeline that scans daily arXiv uploads in AI-related categories. Each model (Kimi, Gemini, Claude) independently selects 5 papers based on "structural importance." Synthesis is then performed to identify overlaps and common themes.