Humans Fail The LLM Test

A satirical debate has 'Iblis' apply standard large‑language‑model critiques to people: short working memory, reliance on scratchpads, shallow pattern‑matching, and transfer failures. The gag shows many 'hallucination' and 'world‑model' complaints fit humans too, suggesting evaluation artifacts and scaffolding design drive a lot of perceived 'understanding' gaps. — Reframing AI deficits as human‑typical failure modes encourages more honest benchmarks and methods (e.g., scratchpads, prompts) before drawing sweeping policy conclusions about AI competence or danger.

Sources

What Is Man, That Thou Art Mindful Of Him?

Scott Alexander 2025.09.02 100% relevant

Lines like 'Without a scratchpad, they only have a working context window of seven plus or minus two' and defenses invoking 'Thinking Mode' mirror chain‑of‑thought and memory critiques used on LLMs.