ARC Agent Speed Test

By launching ARC‑AGI/3 to measure how fast agents learn new tasks, evaluators pivot from static benchmarks to sample‑efficient generalization as a capability signal; this creates a clearer lever for gating deployments and claims. — A learning‑speed standard would shape safety thresholds, capability reporting, and regulatory triggers for agentic systems, influencing procurement, audits, and public risk communication.

Sources

Links for 2025-07-22

Alexander Kruel 2025.07.22 100% relevant

The roundup notes ARC unveiling a benchmark specifically to test how quickly agents learn new tasks.

Links for 2025-07-16

Alexander Kruel 2025.07.16 75% relevant

METR’s finding that AI agents’ time horizon is doubling every ~7 months across domains shifts evaluation toward dynamic capability measures (task length/time horizon), echoing the move from static benchmarks to learning‑ and generalization‑centric metrics.