Tiny Transformers Learn Real Algorithms

Multiple recent experiments show extremely small transformers (hundreds of parameters) can learn to perform long addition on fresh test data, with information‑theoretic checks ruling out memorization. That suggests model architectures can discover compact algorithmic representations, not just statistical associations. — If transformers can internalize algorithms at tiny scale, capability forecasts, interpretability research, safety timelines, and the economics of on‑device AI all need revising.

Sources

Links for 2026-02-25

Alexander Kruel 2026.02.25 100% relevant

GitHub papers reporting a ~777‑parameter and a 456‑parameter transformer solving 10‑digit addition, plus an information‑theoretic analysis (cited in the article).