1 min read

Claude Opus 4.5 & Qwen3-235B Smash Benchmarks as AI Agents Reshape Coding Future

New AI Models & Benchmarks

Anthropic Releases Claude Opus 4.5, Leading in Coding and Agentic Tasks: Anthropic has launched Claude Opus 4.5, a state-of-the-art coding model priced at $5/$25 per million tokens (input/output). Benchmarks show it excels in agentic coding, tool use, and novel problem-solving, reclaiming the top spot on SWE-bench (though narrowly) while reducing costs compared to previous versions.


Qwen3-235B-A22B Tops EsoBench, Outperforming Claude 4.5 Opus in Esolang Learning: Alibaba’s Qwen3-235B-A22B achieved state-of-the-art results in EsoBench, a benchmark testing AI’s ability to learn and execute esoteric programming languages (esolangs), surpassing Claude 4.5 Opus.


Microsoft Unveils Fara-7B: A Lightweight Agentic Model for Computer Automation: Microsoft released Fara-7B, a 7B-parameter multimodal model designed for agentic computer use. It processes images + text to predict actions with grounded reasoning, achieving SOTA performance in its size class.


AI Product & Service Launches

Andrew Ng’s AI Reviewer Matches Human-Level Performance for Research Papers: Coursera founder Andrew Ng launched PaperReview.ai, an AI tool trained on ICLR 2025 reviews that provides human-level feedback for academic papers, aiming to accelerate the peer-review process.


Major AI Announcements & Predictions

Anthropic Engineer Predicts End of Traditional Software Engineering by Mid-2026: An Anthropic engineer claimed that by H1 2026, AI-generated code will be so reliable that developers won’t need to manually verify it, akin to trusting compiler output today.


Elon Musk Teases Grok 5 with Live Video Input & Real-Time Computer Control: Elon Musk hinted that Grok 5 will support live video input and real-time computer interaction, suggesting a major leap toward multimodal, dynamic AI agents.