07 May 2026 1 min read

Subquadratic Claims 1,000x Scaling Efficiency as Zyphra Debuts AMD-Trained Frontier Model

Breakthrough Architectures & Funding

Subquadratic Claims 1,000x Cost Reduction for LLM Scaling: Startup Subquadratic, founded by former DeepMind and Meta engineers, announced a new architecture designed to break current LLM scaling limits by reducing processing costs by up to 1,000x. The company secured $29 million in seed funding at a $500 million valuation, though the scientific community is currently calling for peer review to verify these claims.

Subquadratic claims to break LLM scaling limits! 1000x less costs
- https://subq.ai

New Language Models

Zyphra Releases ZAYA1-8B Model Trained on AMD: Zyphra has launched ZAYA1-8B, a model boasting high intelligence density that aims to compete with larger frontier reasoning models like Claude 4.5 Sonnet and Gemini 2.5 Pro. Notably, the model was pretrained entirely on AMD hardware and networking using a cluster of 1,024 MI300x nodes.

ZAYA1-8B: Frontier intelligence density, trained on AMD
- https://www.zyphra.com/post/zaya1-8b

Inference & Efficiency

Qwen 3.6 27B Gains 2.5x Inference Speed via MTP Support: A new update to llama.cpp introduces Multi-Token Prediction (MTP) support for Qwen 3.6 27B, significantly boosting performance for local agentic coding tasks. This optimization allows the model to run at speeds of 28 tokens per second on consumer hardware like the M2 Max while maintaining a 262k context window.

2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints

ParoQuant Method Introduced for Efficient LLM Reasoning: Z-lab has released ParoQuant, a pairwise rotation quantization technique designed to optimize reasoning LLM inference while maintaining accuracy close to FP16 levels. The method has shown promising results in performance and speedup when compared to other popular quantization methods like AWQ and QTIP.

ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference