Claude Opus 4.5 & Qwen3-235B Smash Benchmarks as AI Agents Reshape Coding Future
New AI Models & Benchmarks
Anthropic Releases Claude Opus 4.5, Leading in Coding and Agentic Tasks: Anthropic has launched Claude Opus 4.5, a state-of-the-art coding model priced at $5/$25 per million tokens (input/output). Benchmarks show it excels in agentic coding, tool use, and novel problem-solving, reclaiming the top spot on SWE-bench (though narrowly) while reducing costs compared to previous versions.
- Opus 4.5 benchmark results
- Anthropic has released Claude Opus 4.5. SOTA coding model, now at $5/$25 per million tokens
- Opus 4.5 only narrowly reclaims #1 on official SWE-bench leaderboard
Qwen3-235B-A22B Tops EsoBench, Outperforming Claude 4.5 Opus in Esolang Learning: Alibaba’s Qwen3-235B-A22B achieved state-of-the-art results in EsoBench, a benchmark testing AI’s ability to learn and execute esoteric programming languages (esolangs), surpassing Claude 4.5 Opus.
Microsoft Unveils Fara-7B: A Lightweight Agentic Model for Computer Automation: Microsoft released Fara-7B, a 7B-parameter multimodal model designed for agentic computer use. It processes images + text to predict actions with grounded reasoning, achieving SOTA performance in its size class.
AI Product & Service Launches
Andrew Ng’s AI Reviewer Matches Human-Level Performance for Research Papers: Coursera founder Andrew Ng launched PaperReview.ai, an AI tool trained on ICLR 2025 reviews that provides human-level feedback for academic papers, aiming to accelerate the peer-review process.
Major AI Announcements & Predictions
Anthropic Engineer Predicts End of Traditional Software Engineering by Mid-2026: An Anthropic engineer claimed that by H1 2026, AI-generated code will be so reliable that developers won’t need to manually verify it, akin to trusting compiler output today.
Elon Musk Teases Grok 5 with Live Video Input & Real-Time Computer Control: Elon Musk hinted that Grok 5 will support live video input and real-time computer interaction, suggesting a major leap toward multimodal, dynamic AI agents.