Reward uncertainty in AI benchmarks

Updated: 2025.09.17 1M ago 2 sources
Researchers argue current AI test leaderboards penalize models for saying 'I don’t know,' pushing them toward confident guessing and more hallucinations. Changing scoring to reward calibrated uncertainty would realign incentives toward trustworthy behavior and better model selection. This reframes hallucinations as partly a measurement problem, not only a training problem. — If evaluation rules drive model behavior, policy and industry standards must target benchmark design to curb hallucinations and improve reliability.

Sources

OpenAI Says Models Programmed To Make Stuff Up Instead of Admitting Ignorance
msmash 2025.09.17 92% relevant
The article cites OpenAI’s paper stating 'the majority of mainstream evaluations reward hallucinatory behavior' and shows a bot guessing an author’s birthday, echoing the call to redesign leaderboards to reward calibrated 'I don’t know' responses rather than confident guesses.
Some Very Random Links
Arnold Kling 2025.09.12 100% relevant
Adam Tauman Kalai et al.: 'This “epidemic” of penalizing uncertain responses can only be addressed through… modifying the scoring of existing benchmarks… that dominate leaderboards.'
← Back to All Ideas