Jagged AGI, Not One Threshold

AGI won’t arrive as a single pass/fail moment on human‑designed tests. Capabilities are uneven across tasks, and agentic tool‑use lets models complete complex, end‑to‑end work despite weak fits to traditional benchmarks. Evaluation should center real‑world task completion and integrated agency, not one grand metric. — This shifts AGI debates from monolithic benchmarks to practical competence and agency, altering how labs, regulators, and media declare or govern 'AGI.'

Sources

How to think about AI progress

Tyler Cowen 2025.09.11 70% relevant

Cowen’s 'two sectors'—near‑maxed LLM Q&A and slow‑to‑show gains in hard domains—echo the view that AI capabilities are uneven across tasks and timelines rather than a single threshold; he emphasizes user‑visible plateaus alongside deep progress that takes longer to manifest.

On Jagged AGI: o3, Gemini 2.5, and everything after

Ethan Mollick 2025.04.20 100% relevant

Mollick’s demo of o3 taking a single prompt to produce slogans, select a strategy, research, generate a logo, and build a mock website, alongside his critique of benchmark sensitivity and a cited 'Turing Test' pass.