**Gemini 3 Pro Beats GPT-5 in Factuality as Mistral Drops 12+ New LLMs in One Week**
Benchmarks & Model Evaluations
DeepMind Releases FACTS Benchmark: Gemini 3 Pro Outperforms GPT-5 in Factuality (68.8% vs 61.8%)
Google DeepMind introduced the FACTS Benchmark, a new evaluation suite for measuring AI model factuality. Gemini 3 Pro scored 68.8%, surpassing GPT-5’s 61.8%, while even Gemini 2.5 Pro outperformed GPT-5. The benchmark highlights gaps in multimodal factuality and provides a standardized metric for truthfulness in AI.
Model Releases & Updates
Mistral AI Releases 12+ New LLMs in a Single Week—3x OpenAI’s 6-Year Output
Mistral AI unveiled a wave of new models (3B to 675B parameters), including coding, reasoning, and instruct variants, all optimized for local use. The release spans models like Devstral (24B, 123B), Ministral-3 (3B–14B), and Mistral-Large-3 (675B), showcasing rapid innovation and hardware flexibility.
GPT-5.2 Rolls Out on Cursor with Enhanced Performance
OpenAI’s GPT-5.2 is now available on Cursor, delivering noticeable improvements in responsiveness and intelligence. Early users report a smoother, more capable experience compared to prior versions.
ByteShape Releases Optimized GGUF Models for Qwen3 and Llama 3.1
ByteShape’s ShapeLearn method optimizes quantization for Qwen3-4B-Instruct and Llama-3.1-8B-Instruct, achieving better performance with lower memory usage. The models are now available on Hugging Face.
Infrastructure & Hardware Innovations
Nvidia-Backed Starcloud Trains First AI Model in Space Using H100 GPUs
Starcloud successfully trained an AI model (Google Gemma) in orbit aboard the Starcloud-1 satellite, powered by an Nvidia H100 GPU and solar energy. This milestone paves the way for orbital data centers with 24/7 uptime and free cooling.
FlashAttention Now Available for Non-Nvidia GPUs (AMD, Intel Arc, Vulkan)
Aule Technologies’ FlashAttention implementation extends support to AMD, Intel Arc, and Vulkan-capable GPUs, eliminating the CUDA dependency for efficient ML inference on non-Nvidia hardware.
Developer Tools & Optimizations
Unsloth Enables 3x Faster LLM Training with 30–90% Less VRAM
Unsloth’s new Triton kernels and smart auto-packing allow training models like Qwen3-4B on consumer hardware with <3.9GB VRAM, achieving 3x speedups without accuracy loss.
llama.cpp Updates CLI for Improved Usability
A new CLI experience has been merged into llama.cpp, enhancing user interaction and streamlining local AI model execution.
Mistral Vibe CLI Expands Context Window to 200K
Mistral AI updated Mistral Vibe with a 200K-token context window, enabling developers to handle more complex tasks and larger datasets.
AI Application Enhancements
ChatGPT Adds Quiz Generation Feature
ChatGPT now supports automatic multiple-choice quiz generation, expanding its utility for education, training, and interactive learning.
Developer Builds "Bullshit Detector" for RAG Hallucinations
A Node.js + pgvector middleware tool detects and blocks AI-generated misinformation in RAG apps by analyzing semantic distance, improving response reliability.
OpenAI Publishes ChatGPT App Display Mode Guidelines
A new reference guide details Inline, Fullscreen, and Picture-in-Picture display modes for ChatGPT apps, helping developers optimize UI/UX for AI-driven interfaces.