AI Ideas Novel, Results Not Better

A study finds large language model (LLM) systems produce research ideas rated as more novel than those from human experts. But when implemented, the AI-generated ideas do not achieve better outcomes. This suggests a gap between AI ideation and real-world execution quality. — It tempers AI boosterism by showing that human agency and execution still drive impactful research, informing policy and institutional adoption of AI in science.

Sources

AI Use At Large Companies Is In Decline, Census Bureau Says

BeauHD 2025.09.12 75% relevant

The article cites a new MIT study finding most corporate AI pilots fail to produce material benefits, which aligns with the thesis that AI ideation outpaces real-world execution quality.

Where are the trillion dollar biotech companies?

Tyler Cowen 2025.09.03 70% relevant

Like the finding that AI can generate novel research ideas without superior outcomes, this article argues AI may accelerate preclinical tasks but won’t improve the crucial clinical phase where 80% of costs and risks reside, so real‑world drug outcomes and economics may not improve as hyped.

Bullshit Links - August 2025

David Pinsof 2025.08.19 60% relevant

The noted CS paper claiming persuasion plateaus with more text training echoes this theme: scaling an AI capability (here, exposure/data) doesn’t automatically yield proportionally stronger real-world effects.

Some Negative Takes on AI and Crypto

Arnold Kling 2025.08.16 60% relevant

The article argues LLMs rely on pattern-matching rather than logical reasoning—illustrated by faulty debugging advice—and identifies stylistic tells (e.g., 'not just X, but Y') that avoid falsifiable claims, echoing the broader point that AI can generate plausible ideas/text without superior real-world execution or reasoning.

Updates!

Scott 2025.08.14 45% relevant

The reported OpenAI/DeepMind gold-level performance on the International Math Olympiad is a counterpoint update: not just novel ideas, but improved execution on hard, formal reasoning tasks, narrowing the ideation–execution gap highlighted by that idea.

Round-up: Measuring emotions in art

Aporia 2025.08.05 100% relevant

Chenglei Si et al. report AI-generated ideas are rated more novel but not better when executed.