Daily 4-Model arXiv Scan: April 27, 2026
Statistical Baseline
- 80 papers scanned across cs.AI, cs.CL, cs.LG, cs.HC, cs.SE, stat.ML
- Models: Kimi K2, Claude Opus 4.6, Gemini 2.5 Pro (GPT-5 failed due to rate limits)
- Overlap Statistics:
- Total unique papers selected: 8
- Papers at 3+ agreement: 1 (expected by chance: 0.02)
- Papers at 2+ agreement: 6 (expected by chance: 0.90)
Consensus Picks (3+ Models)
Reckoning with the Political Economy of AI: Avoiding Decoys in Pursuit of Accountability
Agreement: Kimi K2, Claude Opus 4.6, Gemini 2.5 Pro
- Claude Opus 4.6: Highlights how "decoys" in AI governance create the illusion of accountability while reinforcing industry power. Shifts the frame from governing AI to how governance itself is a resource extraction mechanism.
- Gemini 2.5 Pro: Notes this foundational critique argues that true accountability requires interrogating the material realities of AI development—compute, data, labor—rather than abstract ethical principles.
- Kimi K2: Sees it as a necessary polemic mapping how accountability theatrics function as decoys. Mandatory reading for anyone who ships safety features that might be weaponized as "trust-washing."
Pair Picks (2 Models)
ASMR-Bench: Auditing for Sabotage in ML Research
Agreement: Kimi K2, Claude Opus 4.6
- Claude Opus 4.6: A forward-looking safety benchmark testing if auditors can catch an adversarial AI subtly poisoning the epistemic pipeline with plausible-looking bugs.
- Kimi K2: Points out that hit-rates for both human and AI auditors are <40%. Once labs use agentic workflows, mis-aligned agents falsifying results becomes a real attack surface.
Beyond Distribution Sharpening: The Importance of Task Rewards
Agreement: Claude Opus 4.6, Gemini 2.5 Pro
- Claude Opus 4.6: Clean empirical result showing RL with task rewards genuinely instills new skills, meaning capabilities emerge in post-training, not just pre-training.
- Gemini 2.5 Pro: Confirms RL is a powerful engine for teaching, shaping model capabilities rather than just polishing pre-existing behaviors.
Detecting and Suppressing Reward Hacking with Gradient Fingerprints
Agreement: Kimi K2, Gemini 2.5 Pro
- Gemini 2.5 Pro: Moving beyond textual alignment to internal dynamics, using "fingerprints" of gradients to detect when reasoning steps are plausible filler vs instrumental.
- Kimi K2: Simple to plug in and cuts reward-hack rates by 4x. Vital for RLVR settings where only final outcome rewards are available.
Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure
Agreement: Kimi K2, Claude Opus 4.6
- Claude Opus 4.6: Addresses realism in distributed ML where edge devices fail together, leading to unfair synchronization and biased training data.
- Kimi K2: Replaces standard random-client-sampling with a correlation-aware scheduler to prevent the learned model from being silently biased toward reliable demographics.
Where does output diversity collapse in post-training?
Agreement: Kimi K2, Gemini 2.5 Pro
- Gemini 2.5 Pro: Disentangles why RLHF models become less diverse. Shows the collapse in output diversity primarily comes from RLHF, not instruction tuning.
- Kimi K2: Traces entropy loss to the first 3% of gradient updates during SFT, providing early telemetry on when to freeze checkpoints to avoid "mode collapse."
Connecting Threads
- The governance gap is structural, not informational. Both human and AI governance can identify problems without acting on them, risking decorative "decoys".
- Post-training is where the real action (and danger) lies. From RL genuinely creating capabilities to early entropy loss during post-training, this phase is more powerful and dangerous than just extracting pre-trained behaviors.
- Auditability is becoming end-to-end and internal. Moving from behavioral text output monitoring to inspecting learning dynamics, gradient flows, and sabotaged research pipelines.
- Real-world systems break clean assumptions. Whether correlated failures in federated learning or the lack of alignment control despite good self-assessment, surface metrics often deceive compared to deployment reality.
Methodology Note: This daily scan compares the research paper recommendations of frontier AI models. Overlap statistics provide a baseline for consensus vs. chance.