World‑Model Criterion for AI Risk

Adopt an operational ‘world‑model’ test as a regulatory trigger: measure a model’s capacity to form editable internal state representations (e.g., board‑state encodings, space/time neurons) and to solve genuinely out‑of‑distribution tasks. Use standardized probes and documented editing/verification experiments to decide when systems move from narrow tools into governance‑sensitive classes. — A reproducible criterion for detecting internal conceptual models would give policymakers a concrete, evidence‑based trigger for stepped safety rules, disclosure, and independent auditing of high‑impact AI systems.

Sources

Do AI models reason or regurgitate?

Louis Rosenberg 2026.01.14 100% relevant

The article cites studies (Othello board‑state editing; space/time neurons) and gives a Gemini 3 example of OOD problem solving as the empirical signals that could be formalized into this criterion.