Evaluating GPT‑5 mainly against the immediately prior state‑of‑the‑art hides the real step change compared to GPT‑4. Coupled with a shorter release interval, this 'boiling frog' evaluation habit normalizes rapid capability growth as incremental progress.
— If public and policy debates anchor on flattering benchmarks, they will under‑estimate near‑term AI impacts and set miscalibrated governance priorities.
Alexander Kruel
2025.08.08
100% relevant
The post notes the GPT‑5 release came four months faster than the GPT‑3→4 gap and argues most reviewers compare GPT‑5 to the last SOTA rather than GPT‑4, dulling perceived gains.
← Back to All Ideas