Auto‑routing misclassifies task difficulty

GPT‑5 automatically decides which sub‑model to use and how long to reason, but it can misjudge what is 'hard.' The same prompt can be routed to a weak model one time and a deep‑reasoning model the next, yielding very different results. This turns model selection into a hidden, stochastic variable for users. — If routers routinely misclassify complexity, AI reliability, benchmarking, and safety claims hinge on routing policies as much as on base‑model capability.

Sources

GPT-5: It Just Does Stuff

Ethan Mollick 2025.08.07 100% relevant

Mollick’s 'create an SVG of an otter on a plane' test: about two‑thirds of runs were treated as 'easy' with poor output; the rest triggered a Reasoner and better results.