A new Remote Labor Index test (Scale AI + Center for AI Safety) gave hundreds of real paid freelance tasks to leading AI systems and found the best model fully completed only ~2.5% of assignments, with roughly half producing poor quality or leaving the work incomplete. Failures included corrupt outputs, wrong visual handling, missing data, and brittle memory — concrete limits on current automation capacity.
— If replicated, this should temper near‑term job‑elimination narratives, redirect policy toward augmentation, verification standards, and targeted retraining, and shape who bears liability when AI is deployed on real economic tasks.
EditorDavid
2026.01.10
100% relevant
Remote Labor Index study reported in the Washington Post: models (ChatGPT, Gemini, Claude) succeeded on 2.5% of real freelancing gigs; failures included corrupt files, missing data and visual errors.
← Back to All Ideas