DeepMind & OpenAI Unite for "AI Manhattan Project" Amid GPT-5.2 Codex Coding Win
AI Model Performance & Benchmarks
GPT-5 Struggles on FormulaOne Hard Problems Benchmark: GPT-5 scored 0% on the most difficult reasoning tasks in the new FormulaOne benchmark, designed to test advanced LLM capabilities. The benchmark remains "unsaturated," highlighting gaps in current AI reasoning. Researchers released the dataset and methodology for further evaluation.
GPT-5.2 Codex Leads in SWE-Bench Pro Coding Benchmark: GPT-5.2 Codex achieved the highest accuracy (56.4%) on the SWE-Bench Pro benchmark, outperforming competitors like Claude Opus 4.5 (52.1%) and Gemini 3 Pro (43.3%). The results underscore advancements in AI-assisted software engineering.
New Models & Releases
Mistral OCR 3 Launches with 74% Win Rate Over Predecessor: Mistral’s latest OCR model excels in processing handwriting, forms, scanned documents, and complex tables. It is now available via the AI Studio Playground and API, with benchmark results confirming superior performance across document types.
AI Infrastructure & Performance Optimizations
Kimi K2 Achieves 28.3 Tokens/Sec on 4x Mac Studio Cluster: A user benchmarked Kimi K2 using llama.cpp RPC and Exo’s RDMA Tensor, with the latter delivering significant speed improvements. The comparison highlights Exo’s efficiency for local AI inference.
AI Tools & Developer Products
Echode: Agentic Coding Extension for VSCode: This new extension automates coding tasks (e.g., grepping, edits, diagnostics) with multiple modes (Agent, Plan, Ask). It supports various AI models, requires no config files, and is available on the VSCode Marketplace.
Industry Collaborations & Initiatives
Google DeepMind & OpenAI Partner for U.S. DOE’s "AI Manhattan Project": The two AI giants will combine reasoning models with federal datasets to accelerate scientific breakthroughs by 2030, focusing on fusion energy, climate modeling, and quantum computing. The Genesis Mission aims to enhance national security and sustainable energy solutions.