Google Debuts Gemma 4 and Gemini Beats Law Professors Amidst DeepSWE Benchmark Scrutiny
New Model Releases
Google Launches Gemma 4 12B and Teases Larger Variants: Google has officially released Gemma 4 12B, a multimodal, encoder-free model designed for high performance on consumer-grade hardware. The model features a 256K context window and support for over 140 languages, though initial community benchmarks show it trailing the smaller Qwen3.5-9B in overall efficiency despite strong coding performance.
- New Google Gemma 4 12B Claims Near-26B Performance - We Tested Both!
- More Gemma 4 models incoming
- google/gemma-4-12B · Hugging Face
- gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint
Research and Performance
Gemini 2.5 Pro Outperforms Law Professors in Stanford Study: A study conducted by Stanford University found that Google’s Gemini 2.5 Pro beat 16 law professors at answering legal questions 75% of the time. The AI's responses were rated higher and were less likely to be flagged as harmful, suggesting LLMs are becoming viable tools for scalable evaluation in complex professional domains.
Industry Benchmarks
DeepSWE Benchmark Reliability Questioned Following Audit: A recent audit of the DeepSWE benchmark has revealed significant flaws and suggested that the evaluation was rushed. The findings indicate that the benchmark requires substantial improvements before it can be considered a reliable industry standard for measuring model quality.