1 min read

NVIDIA’s 47x Faster Jet-Nemotron & Lemonade’s Local LLM Router Boost AI Performance

New AI Models & Releases

GLM-4.6-GGUF Released: The latest version of the GLM model is now available in quantized GGUF formats (1-bit to 4-bit), including 2-bit (135GB) and 4-bit (204GB) versions. Chat template issues have been resolved for compatibility with llama.cpp/llama-cli --jinja.


Liquid AI Releases LFM2-Audio-1.5: A new end-to-end audio foundation model supporting ASR, TTS, and custom vocabulary, optimized for speed. Includes demo, blog, GitHub, and Hugging Face resources.


Jet-Nemotron 2B/4B with 47x Faster Inference: NVlabs’ new models leverage FlashAttention2 and JetBlock for accelerated inference on H100 GPUs. Optimized for high-performance hardware; mobile support pending.


AI Tools & Frameworks

Lemonade: Local OpenRouter for LLM Auto-Configuration: A local LLM server-router that auto-configures high-performance inference engines (e.g., FastFlowLM for AMD NPUs, Metal for macOS). Future plans include app integrations and expanded backend support.


Claudette Coding Agent (v5 Update): A modified coding agent configuration for free models (e.g., ChatGPT 4.1/5), focused on autonomous debugging, code cleanup, and positive language.


SpeakWithSQL: AI-Powered SQL Generator: A Node.js + Express tool that generates SQL from database metadata and user prompts using the OpenAI API.


AI Industry & Partnerships

Visual Electric Joins Perplexity: The acquisition/partnership hints at enhanced image generation capabilities for Perplexity’s platform.