29 Oct 2025 1 min read

OpenAI’s $135B Microsoft Deal Meets Breakthroughs: Kani TTS 5x Faster, FlashTensors Serves 100 Models on 1 GPU

New AI Models & Tools

Kani TTS English Released – 400M Parameter TTS Model 5x Faster Than Realtime on RTX 4080
The open-source Kani TTS English model (400M parameters) is now available under the Apache 2.0 License, optimized for real-time conversation, affordable deployment, and accessibility tools. It achieves 5x real-time speed on an RTX 4080, with resources including pretrained checkpoints, GitHub repos, and a Discord community.

Just dropped Kani TTS English - a 400M TTS model that's 5x faster than realtime on RTX 4080

FlashTensors: Serve 100 Large AI Models on a Single GPU with Minimal Latency
FlashTensors, an open-source project, enables efficient serving of 100+ large AI models on a single GPU with low impact on time-to-first-token. It accelerates model loading from SSD to VRAM (up to 10x faster) and supports vLLM and transformers for serverless inference, robotics, and local agents.

Serve 100 Large AI Models on a single GPU with low impact to time to first token

Industry & Partnerships

Microsoft Secures 27% Stake in OpenAI’s $135B Valuation, Extends IP Rights to 2032
Microsoft and OpenAI finalized a landmark deal granting Microsoft a 27% stake in OpenAI (valued at ~$135B) as part of its transition to a for-profit public benefit corporation. Microsoft retains exclusive IP rights to OpenAI’s models (including post-AGI developments) until 2032, while OpenAI commits to purchasing $250B in Azure services. Microsoft’s market cap surpassed $4T following the announcement.

Model Updates & Community Discussions

Qwen3 Max "Thinking" Variant Released – Early Benchmarks Show Coding Potential
Users discuss Qwen3 Max Thinking, a new variant optimized for coding tasks, noting its tendency to "overthink" but potential to outperform proprietary models in coding benchmarks. A dedicated coding-focused variant has also been released.

Qwen3 Max Thinking this week