Alibaba’s Qwen3-VL Models Launch with Cookbooks as Firefox Embraces Perplexity AI Search
New AI Models & Releases
Qwen3-VL-4B and 8B Instruct & Thinking Models Released: Alibaba’s Qwen team has launched the Qwen3-VL-4B and 8B models in both Instruct and Thinking variants. These vision-language models are optimized for local deployment via MLX, GGUF, and NexaML/NexaSDK, with Hugging Face hosting all versions. Benchmarks highlight competitive performance across tasks like STEM, text recognition, and subjective reasoning.
- Qwen3-VL-4B and 8B Instruct & Thinking are here
- Qwen3-VL 4B vs 8B vs 235B
- Benchmark comparison against GPT-4V and Gemini2 across 15+ vision-language tasks.
Developer Tools & Frameworks
OpenAI’s Codex CLI to Introduce "Plan Mode": A leaked video reveals OpenAI is adding a planning phase to its Codex CLI tool, enabling structured code change execution akin to tools like Cline. This feature aims to improve workflow efficiency by pre-analyzing tasks before implementation.
AI Applications & Integrations
Firefox Integrates Perplexity’s AI Search Engine: Mozilla has added Perplexity’s AI answer engine as a native search option in Firefox, allowing users to access AI-powered summaries and responses directly from the browser.
Real-Time AI Study Buddy with Screen Interaction: A developer demoed an open-source study assistant combining Qwen3-VL (visual analysis), Parakeet (TTS), and Orpheus (STT) to interact with on-screen content, describe diagrams, and generate study guides. Future updates may include PDF summarization.
Documentation & Resources
Qwen3-VL Cookbooks Released for Advanced Use Cases: The Qwen team published GitHub cookbooks with step-by-step guides for recognition, localization, document parsing, video understanding, 3D grounding, and more. Each notebook includes Colab links for easy testing.