Researchers argue current AI test leaderboards penalize models for saying 'I don’t know,' pushing them toward confident guessing and more hallucinations. Changing scoring to reward calibrated uncertainty would realign incentives toward trustworthy behavior and better model selection. This reframes hallucinations as partly a measurement problem, not only a training problem.
— If evaluation rules drive model behavior, policy and industry standards must target benchmark design to curb hallucinations and improve reliability.
msmash
2025.09.17
92% relevant
The article cites OpenAI’s paper stating 'the majority of mainstream evaluations reward hallucinatory behavior' and shows a bot guessing an author’s birthday, echoing the call to redesign leaderboards to reward calibrated 'I don’t know' responses rather than confident guesses.
Arnold Kling
2025.09.12
100% relevant
Adam Tauman Kalai et al.: 'This “epidemic” of penalizing uncertain responses can only be addressed through… modifying the scoring of existing benchmarks… that dominate leaderboards.'
← Back to All Ideas