1 min read

**Gemini 3.0 December Launch & NVIDIA’s RTX 5090 Demo Qwen3-VL at Blazing Speeds**

AI Models & Releases

Gemini 3.0 Announcement: Google CEO Sundar Pichai confirmed that Gemini 3.0, the next major version of Google's flagship AI model, will launch in December 2025, continuing the annual December release cycle. The update is expected to bring significant advancements over Gemini 2.5.


Hardware & Benchmarks

NVIDIA RTX Pro 6000 vs. DGX Spark: Benchmarks reveal the RTX Pro 6000 outperforms NVIDIA’s DGX Spark by 6-7x in LLM inference tasks, showcasing superior memory bandwidth and efficiency across various batch sizes and models.

RTX Pro 6000 Blackwell vLLM Performance: The Blackwell Workstation Edition excels in handling 120B models, demonstrating high throughput, multi-user scaling, and efficient extended output generation, positioning it as a top-tier choice for local AI workloads.

NVIDIA 5090 Demo with Qwen3-VL GGUF: NVIDIA provided a RTX 5090 GPU to showcase Qwen3-VL GGUF models (4B/8B), highlighting high token speeds and optimized VRAM usage for local AI applications.


AI Research & Techniques

Cerebras REAP Pruning for MoE Models: Cerebras introduced REAP, a one-shot pruning method for Mixture-of-Experts (MoE) models, achieving minimal accuracy loss while compressing Qwen3-Coder-480B to 363B and 246B. Pruned checkpoints are available on Hugging Face.

CISPO Algorithm for RL Training: A Reddit user shared insights on treating reinforcement learning (RL) training like an SRE (Site Reliability Engineering) project, emphasizing the stability and efficiency of the CISPO algorithm in late-stage training and monitoring.


AI Products & Updates

Perplexity Voice Mode Update: Perplexity’s voice mode received a minor update, adding new voices (e.g., Kyrin, Tylis) and leveraging OpenAI’s /v1/realtime/calls endpoint for improved stability. The voices align with OpenAI’s existing options (e.g., cedar, alloy).