Anthropic’s ‘Mythos’ Emerges Amid Allegations of Data Contamination in Frontier AI Benchmarks
Large Language Models & Frontier Research
Anthropic Testing New 'Mythos' Model: Anthropic has begun testing a new high-end model named 'Mythos,' which reportedly surpasses the capabilities of their current Opus line. Part of a new 'Capybara' tier, the model shows major improvements in reasoning and coding, though its rollout is being handled cautiously due to its powerful cybersecurity capabilities.
ARC-AGI 3 Paper Questions Frontier Model Benchmarks: A new research paper alleges that frontier models, including Gemini 3, may have inflated their ARC-AGI scores through memorization of training data rather than true reasoning. This potential data contamination raises concerns about the validity of current benchmarks used to measure AI generalization.
Audio & Multimodal AI
Mistral AI Releases Voxtral TTS: Mistral AI has launched Voxtral TTS, a 3-billion-parameter open-weight text-to-speech model that supports nine languages. The model reportedly outperforms ElevenLabs Flash v2.5 in human preference tests and is optimized for local use, requiring only 3 GB of RAM.
- Mistral AI to release Voxtral TTS, a 3-billion-parameter text-to-speech model with open weights that the company says outperformed ElevenLabs Flash v2.5 in human preference tests. The model runs on about 3 GB of RAM, achieves 90-millisecond time-to-first-audio, supports nine languages.
- Mistral AI to release Voxtral TTS, a 3-billion-parameter text-to-speech model with open weights that the company says outperformed ElevenLabs Flash v2.5 in human preference tests. The model runs on about 3 GB of RAM, achieves 90-millisecond time-to-first-audio, supports nine languages.
- [VIDEO] Voxtral 4B TTS 2603: Installation + 9-Language Demo
AI Performance & Efficiency
Qwen 3.5 Achieves 1.1 Million Tokens Per Second: Google Cloud has demonstrated extreme scalability by running the Qwen 3.5 27B model at a throughput of over 1.1 million tokens per second. This was achieved using 96 NVIDIA B200 GPUs on Google Kubernetes Engine, showcasing significant advancements in inference speed for large-scale deployments.
TurboQuant Integration in Llama.cpp: Google's new TurboQuant quantization method is being benchmarked in llama.cpp, offering a way to drastically reduce KV cache size. This technology promises more efficient local inference by enabling extreme compression with minimal impact on model performance.
- TurboQuant in Llama.cpp benchmarks
- https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression
- https://github.com/ggml-org/llama.cpp/discussions/20969
- https://github.com/Aaryan-Kapoor
- https://github.com/Mintplex-Labs/anything-llm
- https://github.com/Blaizzy/mlx-vlm/pull/858
- https://github.com/vllm-project/vllm-omni/pull/2214
- https://developer.nvidia.com/blog/optimizing-inference-for-long-context-and-large-batch-sizes-with-nvfp4-kv-cache
- https://www.reddit.com/gallery/1s4bzo2