A controlled tournament using AI reviewers (Gemini, Opus, GPT‑5.4) found AI-authored analyses ranked above human-authored ones, and causal estimates from agentic models matched human medians while showing narrower tails. If robust, this suggests AI systems can both perform and adjudicate empirical work in economics at scale.
— If AI systems can reliably replicate and evaluate causal inference, academic norms, peer review, and research labor markets may shift toward automated production and assessment.
Tyler Cowen
2026.04.21
100% relevant
The article summarizes a paper where three AI reviewer models compared 300 groups of submissions and consistently ranked Codex GPT‑5.4, GPT‑5.3‑Codex, and Claude Code Opus 4.6 above human researchers.
← Back to All Ideas