Treat candidate programs, prompts, or model inputs as a population and use an LLM to propose targeted mutations; evaluate with an external score, keep the fittest, and repeat — producing cumulative capability gains across generations. Imbue’s Darwinian Evolver applied this pattern to ARC‑AGI‑2 and achieved large, verifiable jumps in benchmark performance for multiple models.
— If LLMs can reliably serve as mutation engines that improve other models or artifacts, that creates a low‑friction path to capability improvements and raises practical questions about governance, competitive dynamics, and safety oversight.
Alexander Kruel
2026.02.27
100% relevant
Imbue’s research posts and GitHub (Darwinian Evolver) plus ARC‑AGI‑2 score changes (Kimi K2.5 12%→34%, Gemini 3 Flash 34%→61%, Gemini 3.1 Pro 95%).
← Back to All Ideas