27 Nov 2025 2 min read

Intellect-3 & Open-Source AI Beat Humans on ARC-AGI as Mistral Teases New Model

New AI Models & Releases

Intellect-3: A 106B Parameter Mixture-of-Experts Reasoning Model
Intellect-3 is a newly released 106B-parameter MoE model based on GLM 4.5 Air, optimized for long chain-of-thought reasoning, single-turn Python coding, and scientific tasks. It delivers strong benchmark performance and runs efficiently on H200 GPUs.

Intellect-3: Post-trained GLM 4.5 Air
- Hugging Face Model

Qwen3 Next Nears Release in llama.cpp
Qwen3 Next, an advanced language model, is nearing full integration into llama.cpp, bringing optimizations and performance improvements. The model is available in GGUF format for local deployment.

Qwen3 Next almost ready in llama.cpp

Mistral Teases Upcoming Model Release
Mistral AI may soon launch a new model (potentially "Mistral Medium 3.2"), as hinted by a cloaked model on OpenRouter showing improved performance over prior versions.

Mistral might be releasing a new model soon

AI Research & Breakthroughs

Open-Source AI Surpasses Humans on ARC-AGI Benchmark
German researchers developed an AI system achieving 71.6% on ARC-AGI (vs. human avg. of 70%) using cost-effective techniques like Product of Experts and Depth-First Search, costing just $0.02 per task.

Open-source just beat humans at ARC-AGI (71.6%) for $0.02 per task

Anthropic’s AI Agent Framework for Long Projects
Anthropic introduced a method to stabilize AI agents for long-term projects using an initializer agent (project setup) and a coding agent (task execution), preventing task drift.

Anthropic just showed how to make AI agents work on long projects without falling apart
- Anthropic Blog

Anthropic Reports R&D Evaluation Saturation
Anthropic’s internal AI evaluations are nearing saturation, suggesting current models are hitting performance limits without major breakthroughs.

Anthropic claims internal AI R&D evals are near saturation
- Claude Opus 4.5 System Card

AI Tools & Infrastructure

OpenWhisper: Free, Local Audio Transcription
OpenWhisper is a new open-source transcription tool using OpenAI’s Whisper Local library, enabling private, offline audio-to-text conversion.

OpenWhisper - Free Open Source Audio Transcription
- GitHub (OpenWhisper)
- OpenAI Whisper

NornicDB: 4x Faster Neo4j Alternative (MIT Licensed)
NornicDB is a Go-based drop-in replacement for Neo4j, offering 4x faster performance, lower memory usage, and quicker load times.

NornicDB - Drop in replacement for neo4j - MIT - 4x faster
- Benchmark Results

AI Benchmarks & Comparisons

GPT-5.1 vs. Gemini 3.0 vs. Opus 4.5: Coding Task Showdown
A comparison of GPT-5.1, Gemini 3.0, and Opus 4.5 across coding tasks reveals:

GPT-5.1: Best at safeguards/validation.
Gemini 3.0: Cost-effective but less thorough.
Opus 4.5: Fastest, most complete (but pricier).
Comparing GPT-5.1 vs Gemini 3.0 vs Opus 4.5 across 3 Coding Tasks
- Benchmark Blog

GPT-5.1 Codex-Max vs Gemini 3 Pro: Hands-on Coding Comparison
- YouTube Demo

AI Theory & Concepts

The "Jagged Frontier" of AI Progress
A discussion on AI’s uneven development ("jagged frontier"), where models excel in some tasks but fail in others, highlighting the path from narrow AI to potential AGI.

Jagged Frontier
- Tomas Pueyo Thread