Experienced economist John Cochrane tested a startup 'Refine' and Claude (an LLM) on a draft booklet and got critique comments comparable to top human referees, plus runnable Matlab code to update graphs. That anecdote foregrounds a near‑term capability: generative tools can reliably perform peer‑review style critique and some reproducible research tasks.
— If AI reliably produces referee‑quality review and reproducible code, academic publishing, tenure, and research funding norms will need to be rethought—who counts as an expert, how credit is assigned, and what startups are worth backing.
Tyler Cowen
2026.04.07
85% relevant
The article documents Claude Code autonomously extending and verifying an economics paper—doing web scraping, writing code, running regressions and producing a memo—and cites Yiqing Xu’s automated verification work; both map directly onto the existing idea that AI tools will function as referees or verifiers in academic research.
Robert VerBruggen
2026.03.26
88% relevant
The article documents direct experiments (Michael Wiebe running ChatGPT/Refine on an AER paper and a study using 150 Claude Code agents) showing AI can catch some errors but misses many and can alter meaning; that supports and nuances the existing claim that AI will act as a new form of academic reviewer rather than a perfect substitute for human judgment.
Arnold Kling
2026.03.25
90% relevant
Tyler Cowen’s suggestion that we could ask AIs to certify, rank, and continuously update the quality/impact of research papers maps directly to the claim that AI can act as an academic referee — taking over epistemic gatekeeping roles formerly performed by journals and peer review.
Tyler Cowen
2026.03.20
75% relevant
The first link, 'Using LLMs to study deregulation,' directly illustrates academic and policy researchers employing LLMs as analytic assistants (i.e., referees/critics) to evaluate regulatory texts or simulate policy effects, which is the core claim of the existing idea that AI will act as an academic referee for scholarship and peer review.
Tyler Cowen
2026.03.19
90% relevant
Two linked items are concrete examples of AI systems applied to evaluate research and models: 'Show Me The Model' (flags hidden assumptions and inconsistencies in text) and 'Frontier Graph' (open‑source exploration of 240K economics papers), which instantiate the idea that AI is being used to adjudicate and surface problems in academic work.
Tyler Cowen
2026.03.09
90% relevant
Tyler Cowen's question — how journals should adapt to rapid AI advances and an expected surge in submissions — directly connects to the existing idea that AI will act as a reviewer/referee (automating checks, triage, or even substantive evaluation) and forces journals to consider integrating AI into peer‑review workflows and standards.
Scott
2026.03.05
70% relevant
A legendary mathematician documenting productive interaction with an LLM functions like an informal referee or collaborator: this episode exemplifies the emerging role of LLMs as tools that can check, suggest, or produce proofs and thus influence standards for verification and credit.
Michael Inzlicht
2026.03.04
70% relevant
The article reports a presenter delivering a talk using 100% AI‑generated slides — a direct example of AI moving from a backend tool into the visible apparatus of academic judgment and presentation, which connects to debates about AI’s role in evaluating, synthesizing, and representing research in scholarly venues.
Arnold Kling
2026.02.25
100% relevant
Cochrane’s on‑record trial of Refine and Claude Opus 4.6 produced organized referee comments and Matlab code; he and the toolmakers (López and Golub) are the concrete actors cited.