NVIDIA’s 47x Faster Jet-Nemotron & Lemonade’s Local LLM Router Boost AI Performance
New AI Models & Releases
GLM-4.6-GGUF Released: The latest version of the GLM model is now available in quantized GGUF formats (1-bit to 4-bit), including 2-bit (135GB) and 4-bit (204GB) versions. Chat template issues have been resolved for compatibility with llama.cpp/llama-cli --jinja.
Liquid AI Releases LFM2-Audio-1.5: A new end-to-end audio foundation model supporting ASR, TTS, and custom vocabulary, optimized for speed. Includes demo, blog, GitHub, and Hugging Face resources.
Jet-Nemotron 2B/4B with 47x Faster Inference: NVlabs’ new models leverage FlashAttention2 and JetBlock for accelerated inference on H100 GPUs. Optimized for high-performance hardware; mobile support pending.
AI Tools & Frameworks
Lemonade: Local OpenRouter for LLM Auto-Configuration: A local LLM server-router that auto-configures high-performance inference engines (e.g., FastFlowLM for AMD NPUs, Metal for macOS). Future plans include app integrations and expanded backend support.
Claudette Coding Agent (v5 Update): A modified coding agent configuration for free models (e.g., ChatGPT 4.1/5), focused on autonomous debugging, code cleanup, and positive language.
SpeakWithSQL: AI-Powered SQL Generator: A Node.js + Express tool that generates SQL from database metadata and user prompts using the OpenAI API.
AI Industry & Partnerships
Visual Electric Joins Perplexity: The acquisition/partnership hints at enhanced image generation capabilities for Perplexity’s platform.