14 Nov 2025 1 min read

GPT-5.1 Benchmarks Leak as MAKER Achieves Million-Step, Error-Free LLM Reasoning

AI Models & Research Breakthroughs

Google DeepMind - SIMA 2: An AI Agent for Virtual 3D Worlds
Google DeepMind unveiled SIMA 2, an advanced AI agent capable of playing, reasoning, and self-improving in virtual 3D environments. The model leverages trial-and-error learning and feedback from Gemini to adapt to new games without additional human data, marking a leap in AI adaptability and training efficiency.

Google DeepMind - SIMA 2: An agent that plays, reasons, and learns with you in virtual 3D worlds
- DeepMind Blog

MAKER: Million-Step, Zero-Error LLM Reasoning
Researchers introduced MAKER, a system enabling million-step, error-free reasoning in LLMs by decomposing tasks into subtasks managed by microagents. The approach uses a multi-agent voting scheme for error correction, offering a scalable solution for complex problem-solving at organizational levels.

Shattering the Illusion: MAKER Achieves Million-Step, Zero-Error LLM Reasoning

OpenAI’s Weight-Sparse Transformer: A Step Toward Interpretable AI
OpenAI released an experimental weight-sparse transformer model designed to be more interpretable than conventional LLMs. While smaller and less capable than GPT-5, it provides insights into the inner workings of neural networks, helping address issues like hallucinations and misalignment.

OpenAI’s new LLM exposes the secrets of how AI really works
- OpenAI Blog
- MIT Technology Review

LLM Updates & Benchmarks

GPT-5.1: Performance Benchmarks and Capabilities
Early benchmarks for GPT-5.1 show significant improvements over GPT-5 in reasoning, contextual understanding, and specialized tasks like SWE-bench Verified and GPQA Diamond. The model also introduces enhanced coding assistance via GPT-5.1-Codex.

GPT 5.1 Benchmarks
5.1-codex spotted
Quick benchmark on GPT-5.1-Codex
- Note: Benchmarks suggest Sonnet 4.5 (non-thinking) outperforms GPT-5.1-Codex in select tasks.

Developer Tools & Platforms

Roo Code 3.32.0: Integration of GPT-5.1 and Free Tier Updates
The latest Roo Code update (v3.32.0) introduces GPT-5.1 model support, free access to MiniMax M2 on Roo Code Cloud, extended OpenAI prompt caching, and bug fixes. The release aims to boost developer productivity and coding efficiency.

Roo Code 3.32.0 – GPT-5.1, FREE MiniMax M2 on Roo Code Cloud, extended OpenAI prompt caching
- Release Notes