Google used the same general Gemini 2.5 model found in consumer apps, not a custom‑trained contest version, and simply enabled extended 'thinking tokens' over the five‑hour window. With more test‑time compute for deliberation, it solved 10 of 12 problems—earning a gold medal alongside only four human teams. This suggests runtime reasoning budget can substitute for bespoke training to reach elite performance.
— If test‑time compute can unlock top‑tier problem solving, governance, cost, and safety may hinge as much on runtime inference budgets as on model training.
BeauHD
2025.09.17
100% relevant
Google says Gemini 2.5 was 'enhanced' only to churn thinking tokens for the entire contest and then achieved a gold‑level result.
← Back to All Ideas