10 Oct 2025 1 min read

Microsoft & NVIDIA Supercharge OpenAI with GB300 NVL72 as Google’s Gemini DeepThink Cracks Frontier Math

AI Infrastructure & Compute

Microsoft and NVIDIA Launch First At-Scale GB300 NVL72 Cluster for OpenAI: Microsoft and NVIDIA have deployed the world’s first at-scale NVIDIA GB300 NVL72 supercomputing cluster on Azure, enabling OpenAI to train multitrillion-parameter models in days instead of weeks. This marks a major leap in AI training infrastructure, drastically accelerating large-scale model development.

Microsoft unveils the first at-scale NVIDIA GB300 NVL72 cluster, letting OpenAI train multitrillion-parameter models in days instead of weeks
- NVIDIA Blog: Microsoft Azure Deploys World’s First GB300 NVL72 Supercomputing Cluster for OpenAI

New Models & Research Breakthroughs

Gemini DeepThink Achieves SOTA in Frontier Math: Google’s Gemini DeepThink has set a new state-of-the-art (SOTA) benchmark in solving advanced mathematical problems, demonstrating significant progress in AI-driven high-level reasoning and problem-solving.

Gemini deepthink achieves sota performance on frontier math
- Reddit Gallery (Additional Context)

Qwen3 VL 4B Model Announced for Enhanced OCR: Alibaba’s Qwen3 VL 4B is slated for release, with early documentation suggesting improvements in OCR (Optical Character Recognition) capabilities. The model builds on the success of Qwen2.5/2 VL 3B/7B, which were widely used for OCR tasks.

Qwen3 VL 4B to be released?
- Qwen3-VL Cookbook: Long Document Understanding

Microsoft Releases UserLM-8B, a "User-Side" Conversational Model: Microsoft’s UserLM-8B is a novel LLM trained to simulate the "user" role in conversations, unlike traditional assistant-focused models. This approach aims to create more natural, dynamic interactions by modeling user behavior rather than assistant responses.

microsoft/UserLM-8b - “Unlike typical LLMs that are trained to play the role of the 'assistant' in conversation, we trained UserLM-8b to simulate the 'user' role”
- Hugging Face: UserLM-8B Model

AI Tools & Developer Innovations

Multimodal Local RAG System Built with LM Studio: A developer created a local multimodal RAG (Retrieval-Augmented Generation) system using LM Studio, capable of syncing and processing 10,000+ Google Docs files. The system performs well with Gemma 3 4B, with expectations of further improvements using larger models. The full code will be shared soon.

I made a multimodal local RAG system with LM Studio