1 min read

LLaDA2.0 & Flux 2 Break Barriers: 100B Models & 24GB VRAM Accessibility

New AI Models & Releases

LLaDA2.0 (103B/16B) Released: The latest diffusion language models, LLaDA2.0, feature a 100B/6B MoE architecture (flash version) and a 16B/1B MoE architecture (mini version), optimized for practical applications. Both models are available on Hugging Face, with llama.cpp support in progress.


Flux 2 Now Runs on 24GB VRAM: The advanced AI model Flux 2 can now be executed on consumer-grade GPUs with 24GB VRAM, significantly broadening accessibility for users with mid-range hardware.


AI Tools & Libraries

Unsloth FP8 Reinforcement Learning for Local Training: Unsloth’s updated library enables FP8 reinforcement learning on local hardware with <5GB VRAM, supporting models like Qwen3-4B and Qwen3-1.7B. It optimizes training speed and context length for GPUs like RTX 40/50 series and H100/B200.


Splintr: High-Speed BPE Tokenizer in Rust: A new Rust-based BPE tokenizer, Splintr, offers 3-4x faster single-text encoding and 10-12x faster batch encoding compared to OpenAI’s tiktoken. It includes Python bindings and a streaming decoder for real-time LLM output.


AI Research & Policy

White House Launches "The Genesis Mission": A new federal initiative, "The Genesis Mission," aims to accelerate AI research by integrating scientific datasets to train foundation models and develop AI agents. The project raises discussions about open-source implications and potential regulatory shifts.


Conversational AI Improvements

TEN Turn Detection for Natural Voice AI Interactions: An open-source project, TEN Turn Detection, addresses interruptions in voice AI by better identifying when a user has finished speaking. The model is part of the TEN Framework and available on Hugging Face.