27 Feb 2025 2 min read

OpenAI to livestream GPT-4.5 release in 4.5 hours; Perplexity adds voice mode for real-time answers

New AI Models and Releases

OpenAI GPT-4.5 Release: OpenAI is set to livestream the release of GPT-4.5 in 4.5 hours. The new model will be available to both free and pro users, with potential announcements of a new Pro+ tier.

OpenAI will livestream in 4.5 hours
- https://x.com/OpenAI/status/1895134318835704245

DeepSeek R2 Model: DeepSeek is reportedly favoring a new AI model and aims to release it, codenamed R2, before May 2025. The model is said to have been trained using Hitachi ARTC HD63484 cards.

Report: DeepSeek prefers new AI model and wants to release R2 before May
- https://www.heise.de/en/news/Report-DeepSeek-prefers-new-AI-model-and-wants-to-release-R2-before-May-10297487.html

LLaDA Model: A new AI model called LLaDA (Large Language Diffusion Model) has been released, featuring a diffusion-based architecture that promises parallelized token generation, potentially reducing the need for high memory bandwidth. The model is available on Hugging Face with demo and weights, and a paper detailing its architecture and capabilities has been published on arXiv.

LLaDA - Large Language Diffusion Model (weights + demo)

Microsoft Phi-4 Models: Microsoft has announced two new AI models, Phi-4-multimodal and Phi-4-mini. Phi-4-multimodal is a 5.6B parameter model that supports multiple languages and modalities, including text, vision, and speech. Phi-4-mini is a smaller version of the model, with fewer parameters. Both models are open-source and available for local use, ensuring user privacy.

Kokoro TTS 1.1: Kokoro TTS 1.1, a text-to-speech model, has been released with added support for Chinese. This interim release is part of ongoing development, with users awaiting more language support and training capabilities.

Kokoro TTS 1.1
- https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh

AI Model Benchmarks and Comparisons

Perplexity R1 1776 vs. DeepSeek R1: Benchmark results show that the Perplexity R1 1776 model performs worse than the DeepSeek R1 model for complex problems, contradicting Perplexity's claims about the model's reasoning abilities remaining unaffected by the decensoring process.

Perplexity R1 1776 performs worse than DeepSeek R1 for complex problems.
- https://github.com/fairydreaming/lineage-bench

AI Applications and Services

Perplexity Voice Mode: Perplexity has introduced a new voice mode in their iOS app, allowing users to ask questions and receive real-time answers. The feature is expected to be available on Android and Mac apps soon.

AI Model Architectures and Optimizations

FlashMLA Integration: vLLM has integrated FlashMLA, a new AI model architecture, which has shown significant improvements in output throughput, ranging from 2.8% to 16.8% compared to the previous TRITON_MLA architecture. This integration is expected to bring further enhancements in the coming days.

vLLM just landed FlashMLA (DeepSeek - day 1) in vLLM and it is already boosting output throughput 2-16% - expect more improvements in the coming days

DualPipe Algorithm: DeepSeek has introduced DualPipe, a new bidirectional pipeline parallelism algorithm designed to optimize the overlap of forward and backward computation-communication phases, thereby enhancing efficiency in AI model training.