**Claude 4.5 Opus Hits 4-Hour Benchmark; Rakuten’s 700B Model Challenges U.S. AI Dominance**
AI Model Releases & Benchmarks
Google DeepMind’s "Flash" Outperforms "Pro" in Agentic Reinforcement Learning (RL)
- A research scientist at Google DeepMind revealed that the "Flash" model surpasses "Pro" due to advancements in agentic RL, with benchmark results (SWE-Bench Verified) showing it outperforming GPT-5.2 and Opus 4.5.
Claude 4.5 Opus Achieves 4-Hour 49-Minute Time Horizon on METR Benchmark
- Claude Opus 4.5 demonstrated a 50% task completion time horizon of 4 hours and 49 minutes for software engineering tasks, marking significant progress in long-horizon AI capabilities.
Mistral’s Vibe (with devstral 2) vs. Claude Code on SWE-bench-mini
- Mistral’s Vibe (with devstral 2) scored 37.6% on SWE-bench-mini, narrowly trailing Claude Code’s 39.8%, with results statistically indistinguishable.
New AI Tools & Frameworks
NVIDIA Releases NitroGen: Open-Source Vision-to-Action Model
- NitroGen is a new model that plays video games from raw frames using large-scale imitation learning on human gameplay data, showcasing potential for embodied AI.
Aider-ce: Enhanced AI Coding CLI with Agent Mode & Multi-Codebase Processing
- Aider-ce, the successor to Aider, introduces agent mode, MCP (Multi-Codebase Processing), and skills for cost-efficient, token-light coding assistance.
FlashHead: 50% Faster Token Generation for Small Language Models (SLMs)
- FlashHead is a drop-in replacement for LM heads, using information retrieval to accelerate token generation without sacrificing accuracy.
Product Updates & Integrations
Roo Code 3.36.7–3.36.16: Gemini 3 Flash Preview & Native Tools by Default
- The update introduces native tools for more providers/models, adds Gemini 3 Flash preview, and improves chat error troubleshooting.
Qwen-Image-Layered: Photoshop-Grade Layered Image Editing
- Qwen’s new model enables RGBA layer manipulation with prompt-controlled structure and infinite decomposition for advanced image editing.
Industry & Strategic Developments
Google Faces Compute Constraints, Forms Allocation Council
- Google’s compute crunch has led to a high-level council managing capacity, impacting AI strategy for the next 12–18 months despite heavy investments.
Rakuten to Release 700B Open-Weight Model in Spring 2026
- Japan’s Rakuten plans a 700B-parameter open-weight model, potentially rivaling Chinese models and pressuring U.S. firms to scale up.
Emerging Models & Previews
MiniMax-M2.1: Next-Gen Model with 3D Particle System Demo
- M2.1, the upcoming version of MiniMax-M2, was previewed with an interactive 3D particle system, hinting at improved performance and efficiency.