Reward uncertainty in AI benchmarks

Researchers argue current AI test leaderboards penalize models for saying 'I don’t know,' pushing them toward confident guessing and more hallucinations. Changing scoring to reward calibrated uncertainty would realign incentives toward trustworthy behavior and better model selection. This reframes hallucinations as partly a measurement problem, not only a training problem. — If evaluation rules drive model behavior, policy and industry standards must target benchmark design to curb hallucinations and improve reliability.

Sources

OpenAI Says Models Programmed To Make Stuff Up Instead of Admitting Ignorance

msmash 2025.09.17 92% relevant

The article cites OpenAI’s paper stating 'the majority of mainstream evaluations reward hallucinatory behavior' and shows a bot guessing an author’s birthday, echoing the call to redesign leaderboards to reward calibrated 'I don’t know' responses rather than confident guesses.

Some Very Random Links

Arnold Kling 2025.09.12 100% relevant

Adam Tauman Kalai et al.: 'This “epidemic” of penalizing uncertain responses can only be addressed through… modifying the scoring of existing benchmarks… that dominate leaderboards.'