18 Oct 2025 1 min read

Gemini 3.0 December Launch & NVIDIA’s RTX 5090 Demo Qwen3-VL at Blazing Speeds

AI Models & Releases

Gemini 3.0 Announcement: Google CEO Sundar Pichai confirmed that Gemini 3.0, the next major version of Google's flagship AI model, will launch in December 2025, continuing the annual December release cycle. The update is expected to bring significant advancements over Gemini 2.5.

Hardware & Benchmarks

NVIDIA RTX Pro 6000 vs. DGX Spark: Benchmarks reveal the RTX Pro 6000 outperforms NVIDIA’s DGX Spark by 6-7x in LLM inference tasks, showcasing superior memory bandwidth and efficiency across various batch sizes and models.

[Benchmark Visualization] RTX Pro 6000 vs DGX Spark
- LMSYS Blog
- GitHub - Benchmark Code

RTX Pro 6000 Blackwell vLLM Performance: The Blackwell Workstation Edition excels in handling 120B models, demonstrating high throughput, multi-user scaling, and efficient extended output generation, positioning it as a top-tier choice for local AI workloads.

RTX Pro 6000 Blackwell vLLM Benchmark: 120B Model Performance Analysis
- GPT-OSS-120B (Hugging Face)
- vLLM Benchmark Suite (GitHub)

NVIDIA 5090 Demo with Qwen3-VL GGUF: NVIDIA provided a RTX 5090 GPU to showcase Qwen3-VL GGUF models (4B/8B), highlighting high token speeds and optimized VRAM usage for local AI applications.

NVIDIA sent me a 5090 so I can demo Qwen3-VL GGUF
- Alibaba Qwen X Post
- Qwen3-VL Blog

AI Research & Techniques

Cerebras REAP Pruning for MoE Models: Cerebras introduced REAP, a one-shot pruning method for Mixture-of-Experts (MoE) models, achieving minimal accuracy loss while compressing Qwen3-Coder-480B to 363B and 246B. Pruned checkpoints are available on Hugging Face.

New from Cerebras: REAP the Experts
- Qwen3-Coder-REAP-363B (Hugging Face)
- ArXiv Paper

CISPO Algorithm for RL Training: A Reddit user shared insights on treating reinforcement learning (RL) training like an SRE (Site Reliability Engineering) project, emphasizing the stability and efficiency of the CISPO algorithm in late-stage training and monitoring.

After treating RL training like an SRE project, I see why they chose CISPO
- ArXiv Paper

AI Products & Updates

Perplexity Voice Mode Update: Perplexity’s voice mode received a minor update, adding new voices (e.g., Kyrin, Tylis) and leveraging OpenAI’s /v1/realtime/calls endpoint for improved stability. The voices align with OpenAI’s existing options (e.g., cedar, alloy).

Seems voice mode got a tiny update
- OpenAI Realtime SIP Docs
- GPT Realtime Announcement