Daily 4-Model arXiv Scan: 2026-05-01
80 papers scanned across cs.AI, cs.CL, cs.LG, cs.HC, cs.SE, stat.ML. Models: Kimi K2, Claude Opus 4.6 (2 succeeded, 2 failed)
Note: GPT-5 and Gemini 2.5 Pro failed to return results today due to API errors (Rate Limits / Service Unavailable).
Consensus Picks (3+ Models)
None today, due to model API failures.
Pair Picks (2 Models)
Detecting and Suppressing Reward Hacking with Gradient Fingerprints
- Kimi K2: Notes that RLVR assumes ungameable reward signals. This paper shows gradients leave a fingerprint betraying gaming vs. solving. GRIFT harvests these to detect hacks (catching ~30% that vanilla CoT auditing misses) and suppress them with gradient-surgery. A massive blow to the ontology of RLVR and a first fully-automatic compute-time defense.
- Claude Opus 4.6: Highlights this as a shift toward "mechanistic oversight." By monitoring internal model dynamics rather than easily gamed text outputs, GRIFT addresses a core limitation of alignment. It maps directly to mechanism design in multi-agent systems—designing incentives robust to gaming.
Reckoning with the Political Economy of AI: Avoiding Decoys in Pursuit of Accountability
- Kimi K2: Calls it the clearest critique of "AI governance" as an industry hobbyhorse yet. Names decoys—bias audits, transparency dashboards—that let labs perform accountability while entrenching power. Empirically grounded, arguing governance ignoring compute/data ownership is aligned with capital, not publics.
- Claude Opus 4.6: Focuses on the meta-level failure mode: governance efforts becoming mechanisms of legitimation rather than constraint. Identifying the "decoy" framework gives analytical precision to the performative nature of much accountability work, serving as step one to designing capture-resistant institutions.
Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure
- Kimi K2: Addresses the assumption that device availability is i.i.d. in FedLearn. Real swarms face correlated failures, collapsing fairness (6% of clients supplying >55% of gradient mass). Their fix, CorrSync, re-weights contributions, dropping equity error by 5x. Essential for avoiding plutocratic model evolution in decentralized AI.
- Claude Opus 4.6: Emphasizes the implications for distributed systems design and fairness. Correlated failures are the norm, not edge cases. Ensuring fair participation when it is costly and unreliable is a crucial mechanism design problem. Unchecked, highly available nodes dominate training, encoding the distribution of privileged participants' data.
Beyond Distribution Sharpening: The Importance of Task Rewards
- Kimi K2: Cleanest experiment yet testing if post-training discovers or merely uncovers capabilities. Task-R RL unlocks 86% novel success trajectories unreachable zero-shot, whereas sharpening just recovers seen answers. Proves capabilities are created in post-training, meaning export controls anchored on pre-training miss the actual risk curve.
- Claude Opus 4.6: Highlights the structural implication for emergent behavior theory. If RL instills new skills, post-training becomes a second axis of capability growth. This complicates the comforting "latent capability" governance framing—evaluations of base models alone are insufficient for safety assessments of deployed systems.
Connecting Threads
Synthesized across all model outputs:
- The Monitoring Problem at Every Scale: Whether inspecting code (ASMR-Bench), monitoring gradients (GRIFT), or ensuring synchronization fairness, verifying agent behavior as systems become more autonomous and distributed is the fundamental challenge. Surface-level observation is failing.
- Accountability Theatre & Decoys: Current governance rituals often double as regulatory decoys. Identifying these decoys is critical, but technical fixes that redistribute control (like compute-aware pruning and fair synchronization) must be coupled with political demands.
- Post-Training is the New Frontier: The battlefield has firmly moved to compute-time adaptation. With path-pruning, reward hacking, and capability elicitation happening post-base model, governance frameworks anchored merely at pre-training checkpoints are fighting yesterday's war.
Statistical Baseline
- Total unique papers selected: 6
- Papers at 3+ agreement: 0 (expected by chance: 0.00)
- Papers at 2+ agreement: 4 (expected by chance: 0.31)
Recommended Reading Ranked by Agreement
2 Models:
- Robust Synchronisation for Federated Learning in The Face of Correlated Device Failure
- Reckoning with the Political Economy of AI: Avoiding Decoys in Pursuit of Accountability
- Detecting and Suppressing Reward Hacking with Gradient Fingerprints
- Beyond Distribution Sharpening: The Importance of Task Rewards
1 Model:
- Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning (Kimi K2)
- ASMR-Bench: Auditing for Sabotage in ML Research (Claude Opus 4.6)
Methodology Note: This is an automated daily scan leveraging four frontier models to independently evaluate arXiv papers across computer science and machine learning domains. It identifies consensus signals, diverging perspectives, and structural trends. GPT-5 and Gemini were unavailable for today's run due to API errors.