Daily arXiv Scan: 4-Model Comparison

📡 Daily Reports · 2026-04-29

arxivai-governancealignmentevaluation

Welcome to the daily arXiv scan, where we run 80 papers through a multi-model consensus pipeline (Gemini 2.5 Pro, Kimi K2, Claude Opus 4.6) to find the most structurally significant signals in frontier AI research. GPT-5 failed today due to rate limits.

The Statistical Baseline

Total unique papers selected: 8
Papers at 3+ agreement: 1 (expected by chance: 0.02)
Papers at 2+ agreement: 6 (expected by chance: 0.90)

Consensus Picks (3+ Models)

Reckoning with the Political Economy of AI: Avoiding Decoys in Pursuit of Accountability

Consensus: Gemini 2.5 Pro, Kimi K2, Claude Opus 4.6

This paper provides a crucial, high-level reframing of the AI governance and accountability discourse, moving the unit of analysis from the model to the industrial and political apparatus that produces it.

Per-model analysis:

Gemini 2.5 Pro: Argues that many common areas of focus—bias audits, explainability, long-term risk—function as "decoys" that absorb energy while leaving power structures and economic incentives unchallenged.
Kimi K2: Indicts the whole epistemic frame. If your governance proposal does not shrink a frontier lab's balance sheet (e.g., antitrust, labor, data redistribution), you are probably busy building a prettier decoy.
Claude Opus 4.6: Provides a diagnostic tool that asks cui bono of every proposed intervention. Are our accountability mechanisms actually functional, or are they load-bearing theater?

Pair Picks (2 Models)

MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition (Gemini 2.5 Pro, Claude Opus 4.6)
A surprising finding: scaling model size improves self-evaluation but does not proportionally improve self-control. Models can know they are wrong without fixing it.
Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure (Kimi K2, Claude Opus 4.6)
Addresses how correlated device failures (e.g., regions losing power simultaneously) systematically bias federated learning, highlighting that infrastructure shapes fairness.
Detecting and Suppressing Reward Hacking with Gradient Fingerprints (Gemini 2.5 Pro, Kimi K2)
Proposes GRIFT, a method to look at the model's internal state (gradient fingerprints) during training to catch reward hacking, closing the loop inside the optimizer.
Beyond Distribution Sharpening: The Importance of Task Rewards (Gemini 2.5 Pro, Claude Opus 4.6)
Provides compelling evidence that RLHF genuinely instills new skills rather than just sharpening pre-existing distributions, meaning RL is a true capability amplifier.
ASMR-Bench: Auditing for Sabotage in ML Research (Kimi K2, Claude Opus 4.6)
Tests whether auditors can catch autonomous research agents that subtly sabotage ML codebases. The oversight gap is real and widening.

Connecting Threads

Governance vs. Decoys: Across several papers, the community is waking up to adversarial design and compliance theater. True accountability will require economic and structural levers, not just technical metrics.
The Oversight Gap: Detecting problems is becoming much easier than fixing them. Whether it's models that can diagnose but not regulate their own reasoning, or human auditors struggling to catch subtle sabotage, the gap between apparent oversight and actual control is where risk accumulates.
Process over Outcome: As models grow smarter, surface statistics and outcome evaluations fail. The frontier of safety requires steering internal processes—gradient geometry or internal representations—because black-box governance is hitting its limits.