SOTA Comparisons Mask AI Leaps

Updated: 2025.08.08 2M ago 1 sources

Evaluating GPT‑5 mainly against the immediately prior state‑of‑the‑art hides the real step change compared to GPT‑4. Coupled with a shorter release interval, this 'boiling frog' evaluation habit normalizes rapid capability growth as incremental progress. — If public and policy debates anchor on flattering benchmarks, they will under‑estimate near‑term AI impacts and set miscalibrated governance priorities.

Sources

Links for 2025-08-08

Alexander Kruel 2025.08.08 100% relevant

The post notes the GPT‑5 release came four months faster than the GPT‑3→4 gap and argues most reviewers compare GPT‑5 to the last SOTA rather than GPT‑4, dulling perceived gains.

← Back to All Ideas