AI Peer Review Outranks Humans

A controlled tournament using AI reviewers (Gemini, Opus, GPT‑5.4) found AI-authored analyses ranked above human-authored ones, and causal estimates from agentic models matched human medians while showing narrower tails. If robust, this suggests AI systems can both perform and adjudicate empirical work in economics at scale. — If AI systems can reliably replicate and evaluate causal inference, academic norms, peer review, and research labor markets may shift toward automated production and assessment.

Sources

A Comparison of Agentic AI Systems and Human Economists

Tyler Cowen 2026.04.21 100% relevant

The article summarizes a paper where three AI reviewer models compared 300 groups of submissions and consistently ranked Codex GPT‑5.4, GPT‑5.3‑Codex, and Claude Code Opus 4.6 above human researchers.