25 Nov 2025 1 min read

Claude Opus 4.5 & Qwen3-235B Smash Benchmarks as AI Agents Reshape Coding Future

New AI Models & Benchmarks

Anthropic Releases Claude Opus 4.5, Leading in Coding and Agentic Tasks: Anthropic has launched Claude Opus 4.5, a state-of-the-art coding model priced at $5/$25 per million tokens (input/output). Benchmarks show it excels in agentic coding, tool use, and novel problem-solving, reclaiming the top spot on SWE-bench (though narrowly) while reducing costs compared to previous versions.

Qwen3-235B-A22B Tops EsoBench, Outperforming Claude 4.5 Opus in Esolang Learning: Alibaba’s Qwen3-235B-A22B achieved state-of-the-art results in EsoBench, a benchmark testing AI’s ability to learn and execute esoteric programming languages (esolangs), surpassing Claude 4.5 Opus.

Qwen3-235B-A22B achieves SOTA in EsoBench
- Casey’s Evaluations
- Esoteric Programming Languages (Wikipedia)

Microsoft Unveils Fara-7B: A Lightweight Agentic Model for Computer Automation: Microsoft released Fara-7B, a 7B-parameter multimodal model designed for agentic computer use. It processes images + text to predict actions with grounded reasoning, achieving SOTA performance in its size class.

Fara-7B: An Efficient Agentic Model for Computer Use

AI Product & Service Launches

Andrew Ng’s AI Reviewer Matches Human-Level Performance for Research Papers: Coursera founder Andrew Ng launched PaperReview.ai, an AI tool trained on ICLR 2025 reviews that provides human-level feedback for academic papers, aiming to accelerate the peer-review process.

Andrew Ng Drops Human-Level AI Reviewer
- PaperReview.ai

Major AI Announcements & Predictions

Anthropic Engineer Predicts End of Traditional Software Engineering by Mid-2026: An Anthropic engineer claimed that by H1 2026, AI-generated code will be so reliable that developers won’t need to manually verify it, akin to trusting compiler output today.

Software Engineering "Done" First Half of Next Year

Elon Musk Teases Grok 5 with Live Video Input & Real-Time Computer Control: Elon Musk hinted that Grok 5 will support live video input and real-time computer interaction, suggesting a major leap toward multimodal, dynamic AI agents.

Grok 5 to Feature Live Video + Computer Use
- Elon Musk’s Tweet