No‑CoT Time‑Horizon Metric

Measure AI’s opaque reasoning power by asking how long a human‑equivalent problem the model can reliably solve in a single forward pass (no chain‑of‑thought). Track that 'no‑CoT 50% reliability time horizon' across frontier models and report its doubling time as an alignment‑relevant capability indicator. — A standardized no‑CoT time‑horizon metric gives policymakers and safety researchers an empirical, near‑term indicator of opaque reasoning capacity and therefore a concrete trigger for governance, testing, and disclosure requirements.

Sources

Measuring no CoT math time horizon (single forward pass)

ryan_greenblatt 2026.01.09 100% relevant

Opus 4.5’s measured 3.5‑minute no‑CoT 50% horizon with ~9‑month doubling (author’s dataset of 907 mostly easy competition math problems; repo: github.com/rgreenblatt/no_cot_math_public).