19 Dec 2025 1 min read

DeepMind & OpenAI Unite for "AI Manhattan Project" Amid GPT-5.2 Codex Coding Win

AI Model Performance & Benchmarks

GPT-5 Struggles on FormulaOne Hard Problems Benchmark: GPT-5 scored 0% on the most difficult reasoning tasks in the new FormulaOne benchmark, designed to test advanced LLM capabilities. The benchmark remains "unsaturated," highlighting gaps in current AI reasoning. Researchers released the dataset and methodology for further evaluation.

GPT 5 Scored 0% on FormulaOne Hard Problems
- GitHub: FormulaOne Dataset
- Arxiv Paper

GPT-5.2 Codex Leads in SWE-Bench Pro Coding Benchmark: GPT-5.2 Codex achieved the highest accuracy (56.4%) on the SWE-Bench Pro benchmark, outperforming competitors like Claude Opus 4.5 (52.1%) and Gemini 3 Pro (43.3%). The results underscore advancements in AI-assisted software engineering.

GPT-5.2-Codex: SWE-Bench Pro scores compared to other models

New Models & Releases

Mistral OCR 3 Launches with 74% Win Rate Over Predecessor: Mistral’s latest OCR model excels in processing handwriting, forms, scanned documents, and complex tables. It is now available via the AI Studio Playground and API, with benchmark results confirming superior performance across document types.

AI Infrastructure & Performance Optimizations

Kimi K2 Achieves 28.3 Tokens/Sec on 4x Mac Studio Cluster: A user benchmarked Kimi K2 using llama.cpp RPC and Exo’s RDMA Tensor, with the latter delivering significant speed improvements. The comparison highlights Exo’s efficiency for local AI inference.

Kimi K2 Thinking at 28.3 t/s on 4x Mac Studio cluster

AI Tools & Developer Products

Echode: Agentic Coding Extension for VSCode: This new extension automates coding tasks (e.g., grepping, edits, diagnostics) with multiple modes (Agent, Plan, Ask). It supports various AI models, requires no config files, and is available on the VSCode Marketplace.

Echode - Agentic Coding Extension
- GitHub
- VSCode Marketplace

Industry Collaborations & Initiatives

Google DeepMind & OpenAI Partner for U.S. DOE’s "AI Manhattan Project": The two AI giants will combine reasoning models with federal datasets to accelerate scientific breakthroughs by 2030, focusing on fusion energy, climate modeling, and quantum computing. The Genesis Mission aims to enhance national security and sustainable energy solutions.

Big Collab: Google DeepMind and OpenAI officially join forces for the "AI Manhattan Project"
- DeepMind Blog
- OpenAI Announcement