04 May 2026 1 min read

Google AI Studio Leaks Advanced Models While Gemma4 Benchmarks Reveal Mobile Hardware Constraints

New Models & Benchmarks

Gemma4 E4B Benchmarks on Mobile Hardware: Benchmark results for the Gemma4 E4B model on an iPhone 16 Pro highlight a significant performance gap between CPU and GPU processing. The data underscores that memory speed remains a primary bottleneck for efficient AI inference on mobile devices.

These are the benchmark results for Gemma4 E4B tested on my iPhone 16 Pro.

Potential New Model Leak in AI Studio: Users have reported a new model appearing in Google’s AI Studio A/B testing capable of generating highly complex and detailed SVG images, including mathematical derivations. This indicates a potential upcoming release of an advanced Flash or Pro model with enhanced technical visual capabilities.

Got this SVG from A/B test window inside AI Studio. Still can't believe this is an SVG. Most likely the new flash/pro model.

Research & Development

"Second Thoughts" Refinement Loop for Small Models: A new architectural approach called "Second Thoughts" improves small language models by using a secondary transformer to feed output back into the generation process as a refinement loop. Initial tests on a 1.7B parameter model show drastic performance improvements in specialized tasks like coding.

"Second Thoughts" Been playing with adding a small transformer that reads output near the end of generation, and feeds it back near the top as a refinement loop. A quick test of 1.7B model showed drastic improvement in focused tasks (like coding)
- https://dnhkng.github.io/posts/rys
- https://bigattichouse.medium.com/second-thoughts-improving-small-llms-with-bidirectional-refinement-loops-part-1-fa5ab51af656?sk=907cce272a3aed0eb3f1e3a0669a3964

Advancements in Model Quantization Standards: Technical discussions suggest that standard Llama.cpp quantization may be facing stability issues, prompting a shift toward AutoRound quantization for lower bitrates. Implementing higher-quality quantization methods is becoming critical for maintaining model performance and stability in local deployments.

Llama.cpp quantization is broken

Cybersecurity

Rising Security Vulnerabilities Linked to AI: Recent data indicates a massive spike in cryptocurrency hacks in 2026, suggesting that AI advancements may be facilitating more frequent and effective cyberattacks. This trend highlights the urgent need for AI-driven security solutions to counter these emerging threats.

When AI hits security there will be signs