06 Jan 2026 2 min read

Liquid AI’s LFM2.5 & Nvidia’s Rubin GPU (50 PFLOPS) Push AI Limits On-Device & in Cloud

New AI Models

Nvidia launches Alpamayo: Open AI models for human-like reasoning in autonomous vehicles
Nvidia introduced Alpamayo, a series of open AI models designed to enable autonomous vehicles to process information and make decisions akin to human cognition. The models aim to enhance safety and efficiency in self-driving systems by providing advanced cognitive functions.

Nvidia launches Alpamayo, open AI models that allow autonomous vehicles to 'think like a human'
- TechCrunch article

Liquid AI releases LFM2.5: Frontier-grade reasoning in ultra-efficient on-device models
Liquid AI unveiled LFM2.5, a family of tiny (1.2B parameter) foundation models optimized for on-device deployment. The models deliver "frontier-grade" reasoning, 2x speedup over competitors like Qwen3 and Llama 3.2, and support 4-bit quantization for smartphones, laptops, and vehicles without cloud dependency.

AI Hardware & Infrastructure

NVIDIA Rubin GPU unveiled at CES: 50 PFLOPS inference, 3.6 TB/s NVLink bandwidth
NVIDIA showcased the Rubin GPU at CES, featuring massive performance leaps over Blackwell: 50 PFLOPS for inference, 35 PFLOPS for training, 22 TB/s HBM4 bandwidth, and 336 billion transistors. The architecture targets next-gen AI/ML workloads with unprecedented scalability.

Rubin uplifts from CES conference going on now

llama.cpp achieves 3–4x speedup in multi-GPU setups
The ik_llama.cpp project demonstrated a breakthrough in local LLM inference, enabling 3–4x faster performance in multi-GPU configurations. This advancement maximizes GPU utilization, reducing reliance on high-end enterprise hardware for local AI tasks.

llama.cpp performance breakthrough for multi-GPU setups
- Technical deep dive

Significant performance gains in llama.cpp over 6 months
Collaborative efforts between NVIDIA engineers and the llama.cpp community yielded dramatic throughput improvements (e.g., 2–3x faster token generation for models like Qwen 3 30B) on RTX PCs and DGX Spark systems between September 2025 and January 2026.

Performance improvements in llama.cpp over time

AI Tools & Applications

AI code review showdown: Devstral 2 vs MiniMax M2 vs Grok Code Fast
A comparison of Devstral 2, MiniMax M2, and Grok Code Fast 1 for AI-assisted code reviews revealed all models effectively identified critical vulnerabilities in a TypeScript project. MiniMax M2 provided inline comments, Devstral 2 caught additional edge cases, and Grok Code Fast 1 completed reviews fastest.

Testing Devstral 2 vs MiniMax M2 vs Grok Code Fast for AI code review
- Kilo AI blog