12 Nov 2025 2 min read

VibeThinker-1.5B Outperforms Larger Models as Baidu’s ERNIE-4.5 Redefines Visual AI

New AI Models & Benchmarks

VibeThinker-1.5B: A 1.5B Parameter Model Outperforming Larger Models in Math & Coding: The newly released VibeThinker-1.5B achieves state-of-the-art performance among small models (<4B) in competitive math and coding benchmarks, surpassing DeepSeek R1 0120. The model emphasizes strict training data decontamination and is available for community testing.

We put a lot of work into a 1.5B reasoning model — now it beats bigger ones on math & coding benchmarks

Mistral AI’s K2 Benchmarks Questioned for Real-World Applicability: Users and analysts highlight discrepancies between K2’s benchmark scores and its practical performance, particularly in coding and lambda-calculus tasks, sparking debates about potential "benchmaxxing."

Seems like the new K2 benchmarks are not too representative of real-world performance

Baidu Releases ERNIE-4.5-VL-28B-A3B-Thinking with "Visual Reasoning": Baidu’s new "thinking" variant of ERNIE-4.5-VL-28B introduces advanced visual analysis capabilities, potentially outperforming models like Gemini 2.5 Pro and GPT-5 High in benchmarks.

baidu/ERNIE-4.5-VL-28B-A3B-Thinking released. Curious case..

Reflection AI Achieves Human-Level Performance on ARC-AGI v1 for Under $10k: The open-source Reflection AI model reached 85% accuracy on the ARC-AGI v1 benchmark in just 12 hours, demonstrating cost-efficient human-level performance.

Reflection AI reached human-level performance (85%) on ARC-AGI v1 for under $10k and within 12 hours. You can run this code yourself, it’s open source.
- GitHub Repository

AI Hardware & Infrastructure

Olares Launches $3K MiniPC for Local AI with RTX 5090 Mobile (24GB VRAM): Startup Olares unveils a compact 3.5L MiniPC featuring an RTX 5090 Mobile GPU and 96GB DDR5 RAM, designed for high-performance local AI workloads.

A startup Olares is attempting to launch a small 3.5L MiniPC dedicated to local AI, with RTX 5090 Mobile (24GB VRAM) and 96GB of DDR5 RAM for $3K (Posted twice; duplicate removed)
- TechPowerUp Coverage

AI Tools & Developer Innovations

Nano Banana 2 Generates Hyper-Realistic UI Screenshots: The Nano Banana 2 model demonstrates advanced image generation by producing a near-perfect screenshot of MrBeast’s YouTube page within a Windows 11 browser, maintaining coherence and likeness.

Nano Banana 2 generates a near perfect screenshot of MrBeast on the YouTube homepage, inside a browser, on Windows 11, while keeping coherency and likeness - this model is very impressive

CodeWave: AI-Powered Git Commit Analysis for Smarter Code Reviews: The CodeWave Node.js CLI tool analyzes Git commits, generates interactive HTML reports, and uses AI agents to score changes and reach consensus, integrating with CI/CD pipelines.

We improved dramatically the code reviews starting at the commit level
- GitHub Repository

Claude Code Voice Hooks: Auditory Feedback for Developers: A new tool adds real-time sound effects to Claude Code actions (e.g., errors, completions), providing auditory feedback without requiring console monitoring.

Turned Claude Code into a soundboard — every action now makes a sound 🔊
- GitHub Repository
- Demo Video

AI in Healthcare & Strategic Moves

OpenAI Explores Consumer Health Apps: OpenAI is reportedly considering an expansion into consumer health applications, leveraging its AI models to innovate in personal health solutions.

OpenAI is weighing a move into consumer health apps
- Business Insider Article