Google Releases Gemma 4 While GPT-5.5 Achieves Autonomous Protein Folding Breakthrough
Frontier Models & Hardware
GPT-5.4 and GPT-5.5 Progress and Upcoming Public Release: Cerebras has announced that GPT-5.4 and GPT-5.5 are currently running internally on their chips and will be released to the public in the near future. Demonstrating its advanced capabilities, GPT-5.5 recently spent 150 hours autonomously improving protein folding models, achieving significant performance gains on the SimplexFold model.
- Cerebras CFO says they are currently running GPT5.4 and GPT5.5 internally on their chips, will release to the public soon. (Imagine that intelligence at that speed)
- GPT-5.5 autonomously spent 150+ hours improving protein folding models.
Gemma 4 Releases & Fine-tunes
Google Announces Gemma 4 Series with Community Fine-tunes Already Emerging: Google's Jeff Dean has announced the release of Gemma 4, a model family ranging from edge-scale versions to a 124B parameter Mixture of Experts (MoE) model. In tandem, the community has released "Gemma-4-Gembrain-31B-it-uncensored-heretic," a merged fine-tune designed to enhance logical thinking and creative prose with reduced refusal rates.
- I hope that someday we will have a 124B Gemma.
- Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now, a Merge of Multiple Gemma 4 31B it Finetunes Designed to Boost Logical and Lateral Thinking for Improved Adherence, Increased Swipe Variety and Enhanced Creative Prose, With KLD of 0.0186 and 13/100 Refusals!
Coding Agents & Tooling
SmallCode Agent Achieves 87% Benchmark Score Using 4B Model: A new coding agent named SmallCode has demonstrated high efficiency by outperforming major agents like Cursor while utilizing only a 4B parameter local model. The system achieves these results through the use of compound tools, improvement loops, and code graphs to maintain reliability on smaller hardware.
Inference & Optimization
Multi-Tensor Parallelism (MTP) Optimizations Drive High-Speed Local Inference: New optimizations in llama.cpp aim to eliminate logit copying during prompt decoding, further enhancing the Multi-Tensor Parallelism (MTP) feature. Users are reporting massive speed increases, such as running the Qwen 3.6 27B model at up to 65 tokens per second on mid-range workstation GPUs.