11 Dec 2025 2 min read

Gemini 3 Pro Beats GPT-5 in Factuality as Mistral Drops 12+ New LLMs in One Week

Benchmarks & Model Evaluations

DeepMind Releases FACTS Benchmark: Gemini 3 Pro Outperforms GPT-5 in Factuality (68.8% vs 61.8%)
Google DeepMind introduced the FACTS Benchmark, a new evaluation suite for measuring AI model factuality. Gemini 3 Pro scored 68.8%, surpassing GPT-5’s 61.8%, while even Gemini 2.5 Pro outperformed GPT-5. The benchmark highlights gaps in multimodal factuality and provides a standardized metric for truthfulness in AI.

DeepMind releases FACTS Benchmark: Gemini 3 Pro defeats GPT-5 in factuality (68.8% vs 61.8%)
- DeepMind Blog Post

Model Releases & Updates

Mistral AI Releases 12+ New LLMs in a Single Week—3x OpenAI’s 6-Year Output
Mistral AI unveiled a wave of new models (3B to 675B parameters), including coding, reasoning, and instruct variants, all optimized for local use. The release spans models like Devstral (24B, 123B), Ministral-3 (3B–14B), and Mistral-Large-3 (675B), showcasing rapid innovation and hardware flexibility.

Mistral AI drops 3x as many LLMs in a single week as OpenAI did in 6 years

GPT-5.2 Rolls Out on Cursor with Enhanced Performance
OpenAI’s GPT-5.2 is now available on Cursor, delivering noticeable improvements in responsiveness and intelligence. Early users report a smoother, more capable experience compared to prior versions.

GPT 5.2 Out On Cursor

ByteShape Releases Optimized GGUF Models for Qwen3 and Llama 3.1
ByteShape’s ShapeLearn method optimizes quantization for Qwen3-4B-Instruct and Llama-3.1-8B-Instruct, achieving better performance with lower memory usage. The models are now available on Hugging Face.

We did years of research so you don’t have to guess your GGUF datatypes

Infrastructure & Hardware Innovations

Nvidia-Backed Starcloud Trains First AI Model in Space Using H100 GPUs
Starcloud successfully trained an AI model (Google Gemma) in orbit aboard the Starcloud-1 satellite, powered by an Nvidia H100 GPU and solar energy. This milestone paves the way for orbital data centers with 24/7 uptime and free cooling.

Nvidia backed Starcloud successfully trains first AI in space
- CNBC Article

FlashAttention Now Available for Non-Nvidia GPUs (AMD, Intel Arc, Vulkan)
Aule Technologies’ FlashAttention implementation extends support to AMD, Intel Arc, and Vulkan-capable GPUs, eliminating the CUDA dependency for efficient ML inference on non-Nvidia hardware.

FlashAttention implementation for non Nvidia GPUs
- GitHub – Aule-Attention

Developer Tools & Optimizations

Unsloth Enables 3x Faster LLM Training with 30–90% Less VRAM
Unsloth’s new Triton kernels and smart auto-packing allow training models like Qwen3-4B on consumer hardware with <3.9GB VRAM, achieving 3x speedups without accuracy loss.

You can now train LLMs 3x faster with 30% less memory!
- GitHub – Unsloth
- Docs – 3x Faster Training

llama.cpp Updates CLI for Improved Usability
A new CLI experience has been merged into llama.cpp, enhancing user interaction and streamlining local AI model execution.

new CLI experience has been merged into llama.cpp
- GitHub PR – llama.cpp #17824

Mistral Vibe CLI Expands Context Window to 200K
Mistral AI updated Mistral Vibe with a 200K-token context window, enabling developers to handle more complex tasks and larger datasets.

Mistral Vibe Update
- GitHub – mistral-vibe

AI Application Enhancements

ChatGPT Adds Quiz Generation Feature
ChatGPT now supports automatic multiple-choice quiz generation, expanding its utility for education, training, and interactive learning.

New ChatGPT Feature? Quizzes!
- Reddit Gallery

Developer Builds "Bullshit Detector" for RAG Hallucinations
A Node.js + pgvector middleware tool detects and blocks AI-generated misinformation in RAG apps by analyzing semantic distance, improving response reliability.

My RAG app kept lying to users, so I built a "Bullshit Detector" middleware
- GitHub – AgentAudit
- Demo Dashboard

OpenAI Publishes ChatGPT App Display Mode Guidelines
A new reference guide details Inline, Fullscreen, and Picture-in-Picture display modes for ChatGPT apps, helping developers optimize UI/UX for AI-driven interfaces.

ChatGPT App Display Mode Reference
- OpenAI UI Guidelines
- Sunpeak AI Simulator