1 min read

**GPT-5 Cracks Open Math Problem as Self-Improving AI Agents Break Training Barriers**

AI Research Breakthroughs

Software Agents Achieve Self-Improvement Without Human-Labeled Data
A study demonstrates that software agents can autonomously improve their performance on SWE-bench benchmarks without relying on human-labeled data. The CWM-sft + SSR method showed the most significant gains, highlighting progress in AI's ability to self-optimize for specialized tasks.

GPT-5 Solves Open Math Problem in Enumerative Geometry
GPT-5 autonomously solved an unsolved problem in enumerative geometry, producing a complete and correct solution without human intervention. This marks a milestone in AI's potential to contribute to advanced mathematical research.


AI Benchmarks & Evaluation

METR Highlights Gap Between AI Benchmarks and Economic Impact
Joel Becker of METR discusses the disconnect between current AI capability benchmarks and their real-world economic implications. The analysis underscores the need for more holistic evaluation frameworks to better assess AI's practical utility.


Open-Source AI Models

MiniMax M2.1 Released as Open-Source SOTA for Real-World Development
MiniMax M2.1, now open-source, achieves state-of-the-art performance on coding benchmarks (SWE, VIBE, Multi-SWE), surpassing models like Gemini 3 Pro and Claude Sonnet 4.5. It is optimized for real-world development and agent-based tasks.