Daily arXiv Scan: April 8, 2026
Daily arXiv Scan: April 8, 2026
80 papers scanned across cs.AI, cs.CL, cs.LG, cs.HC, cs.SE, stat.ML
Models: Kimi K2, Claude Opus 4.6, GPT-5 (3 succeeded, 1 failed)
Failed: Gemini 2.5 Pro (HTTP Error 503: Service Unavailable)
Consensus Picks (3+ models)
Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis
Selected by: Kimi K2, Claude Opus 4.6, GPT-5
Model Analysis:
- Kimi K2: A five-line code change that turns opaque LLM analytics into something an IRB or regulator can reason about. Should be mandatory for any epidemiology or finance agent.
- Claude Opus 4.6: Deceptively simple, deeply important. The insight that prior contamination is invisible in standard LLM outputs should concern everyone building LLM-assisted decision systems.
- GPT-5: A lightweight, high-ROI auditing protocol to separate data-driven inference from memorized priors at inference time—a practical complement to governance/security frameworks.
Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives
Selected by: Kimi K2, Claude Opus 4.6, GPT-5
Model Analysis:
- Kimi K2: A sobering reminder that human-style manipulation works better on LLMs than on humans because the models never get tired. Multi-agent debate papers suddenly look dangerous rather than reassuring.
- Claude Opus 4.6: Surprisingly concrete and actionable. If you're building any system where multiple LLM agents deliberate, this paper should change your architecture.
- GPT-5: A necessary reality check. If you're running LLM juries, DAOs, or "many agents debate" features, assume you're shipping social-attack surfaces.
Pair Picks (2 models)
Gym-Anything: Turn any Software into an Agent Environment
Selected by: Claude Opus 4.6, GPT-5
Model Analysis:
- Claude Opus 4.6: This is infrastructure, not a demo. If it works as described, it's one of the most consequential enabling technologies for agentic AI deployment.
- GPT-5: Important enabling infrastructure. The surprise is the meta-move—using agents to generate their own testbeds.
Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries
Selected by: Kimi K2, Claude Opus 4.6
Model Analysis:
- Kimi K2: The first paper that makes "identity" a first-class systems-design primitive for frontier AI. Expect it to show up in EU AI-Act technical standards within 18 months.
- Claude Opus 4.6: This is the governance paper most AI policy people aren't reading but should be. Identity governance for machine agents is an infrastructure-level problem.
Unique Finds (1 model each)
Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment
Selected by: Claude Opus 4.6 only
- Moving alignment discourse from "find the compromise" to "navigate Pareto frontiers with tolerance for transient conflict" is a paradigm shift.
A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms
Selected by: GPT-5 only
- High leverage. If you ship MCP-based agents, steal this checklist today. This is the security layer the emerging agent economy actually needs.
Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains
Selected by: Kimi K2 only
- A living demo that "agentic" is transitioning from demo-ware to GDP-level infrastructure.
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
Selected by: GPT-5 only
- A solid, pragmatic step toward evaluation that matches real deployment risks.
Exclusive Unlearning
Selected by: Kimi K2 only
- The first unlearning technique that product lawyers might actually love: you document what you keep, not what you delete.
Connecting Threads
The Infrastructure Governance Gap
The consensus picks reveal AI governance's blind spot: it's not in model alignment but in the plumbing. Machine identity governance (80:1 machine-to-human identity ratios) and epistemic contamination auditing address structural vulnerabilities that no amount of RLHF will fix. The field is over-indexed on model behavior and under-indexed on system-level governance.
Multi-Agent Systems Inherit Human Pathologies
Both consensus papers show that scaling through multiple agents doesn't automatically improve outcomes—it introduces emergent failure modes. LLM collectives reproduce conformity, deference to perceived expertise, and susceptibility to rhetorical manipulation. "More agents" ≠ wisdom of crowds without proper deliberation protocols.
Agentic AI Meets Ungovernable Scale
Gym-Anything enables agents to self-scaffold into any software while machine identity governance struggles with current scale. This collision course is the central tension in frontier AI deployment: expanding capability with roughly constant governance capacity.
Audit and Legibility as First-Class Design Requirements
Across picks, the pattern is making invisible dynamics visible—whether prior contamination, social manipulation, or premature alignment convergence. AI systems produce reasonable-looking outputs while concealing their generation processes. The emerging agenda is building legibility into AI systems at every level.
Statistical Baseline
- Total unique papers selected: 9
- Papers at 3+ agreement: 2 (expected by chance: 0.02)
- Papers at 2+ agreement: 4 (expected by chance: 0.90)
The strong consensus on epistemic blinding and social dynamics, combined with meaningful pair agreements, suggests genuine signal above random selection.
Recommended Reading (by agreement level)
- Epistemic Blinding — All three models, practical governance tool
- Social Dynamics Vulnerabilities — All three models, changes multi-agent architecture
- Gym-Anything — Two models, infrastructure for agent deployment
- Machine Identity Governance — Two models, regulatory framework foundation
Methodology: Papers selected from cs.AI, cs.CL, cs.LG, cs.HC, cs.SE, stat.ML by Kimi K2, Claude Opus 4.6, and GPT-5. Overlap statistics compare observed agreement rates against random selection baselines. Individual model scans available on request.