Measure AI’s opaque reasoning power by asking how long a human‑equivalent problem the model can reliably solve in a single forward pass (no chain‑of‑thought). Track that 'no‑CoT 50% reliability time horizon' across frontier models and report its doubling time as an alignment‑relevant capability indicator.
— A standardized no‑CoT time‑horizon metric gives policymakers and safety researchers an empirical, near‑term indicator of opaque reasoning capacity and therefore a concrete trigger for governance, testing, and disclosure requirements.
ryan_greenblatt
2026.01.09
100% relevant
Opus 4.5’s measured 3.5‑minute no‑CoT 50% horizon with ~9‑month doubling (author’s dataset of 907 mostly easy competition math problems; repo: github.com/rgreenblatt/no_cot_math_public).
← Back to All Ideas