LLaDA2.0 & Flux 2 Break Barriers: 100B Models & 24GB VRAM Accessibility
New AI Models & Releases
LLaDA2.0 (103B/16B) Released: The latest diffusion language models, LLaDA2.0, feature a 100B/6B MoE architecture (flash version) and a 16B/1B MoE architecture (mini version), optimized for practical applications. Both models are available on Hugging Face, with llama.cpp support in progress.
Flux 2 Now Runs on 24GB VRAM: The advanced AI model Flux 2 can now be executed on consumer-grade GPUs with 24GB VRAM, significantly broadening accessibility for users with mid-range hardware.
AI Tools & Libraries
Unsloth FP8 Reinforcement Learning for Local Training: Unsloth’s updated library enables FP8 reinforcement learning on local hardware with <5GB VRAM, supporting models like Qwen3-4B and Qwen3-1.7B. It optimizes training speed and context length for GPUs like RTX 40/50 series and H100/B200.
Splintr: High-Speed BPE Tokenizer in Rust: A new Rust-based BPE tokenizer, Splintr, offers 3-4x faster single-text encoding and 10-12x faster batch encoding compared to OpenAI’s tiktoken. It includes Python bindings and a streaming decoder for real-time LLM output.
AI Research & Policy
White House Launches "The Genesis Mission": A new federal initiative, "The Genesis Mission," aims to accelerate AI research by integrating scientific datasets to train foundation models and develop AI agents. The project raises discussions about open-source implications and potential regulatory shifts.
Conversational AI Improvements
TEN Turn Detection for Natural Voice AI Interactions: An open-source project, TEN Turn Detection, addresses interruptions in voice AI by better identifying when a user has finished speaking. The model is part of the TEN Framework and available on Hugging Face.