GPT-5 Cracks Math Conjectures & Deception Tests as Alibaba Eyes 10-Trillion-Parameter AI
AI Models & Research
GPT-5 Advancements: Solving Math Conjectures and Social Intelligence
- GPT-5 demonstrates progress in solving minor open math problems (typically requiring PhD-level effort) and shows "flashes of originality" in reasoning, though it struggles with cross-paper synthesis. A study titled "Gödel Test: Can Large Language Models Solve Easy Conjectures?" evaluates its performance on five conjectures.
- In a separate study, GPT-5 outperformed other AI models in Among Us, showcasing advanced deception, persuasion, and theory-of-mind skills.
OpenAI’s New "Alpha Models" for Pro Users
- OpenAI introduced exclusive "Alpha Models" for Pro subscribers, offering enhanced performance and advanced features. Details on capabilities remain limited.
GPT-5-Codex: Enhanced Coding Reasoning
- Users report GPT-5-Codex excels in coding tasks, with improved reasoning and error recovery, making it a standout tool for developers.
Alibaba’s Ambitious Qwen Roadmap
- Alibaba unveiled plans to scale Qwen models aggressively, including:
- Context length: 1M → 100M tokens
- Parameters: Trillion → 10 trillion
- Focus areas: Synthetic data generation, agent capabilities, and test-time compute scaling.
- Alibaba just unveiled their Qwen roadmap. The ambition is staggering!
AI Hardware & Infrastructure
China’s Fenghua No.3 GPU: CUDA-Compatible with 112GB+ HBM
- China’s latest GPU, Fenghua No.3, claims CUDA compatibility, ray tracing (RT) support, and 112GB+ HBM memory, positioning it as a competitor in AI hardware.
AI Agents & Local Deployment
Fully Local AI Agent for Raspberry Pi 5
- A developer built a lightweight, fully local AI agent for Raspberry Pi 5 using small models (Qwen3:1.7B, Gemma3:1B). The agent handles wake-word detection, transcription, and LLM inference on-device.
AI in Visual Reasoning
Google’s Veo 3: Chain-of-Frames for Visual Tasks
- Google’s Veo 3 introduces Chain-of-Frames, a visual analog to Chain-of-Thought, enabling diffusion models to tackle complex visual reasoning tasks (e.g., Arc AGI, Clockbench) without relying on multimodal LLMs.
Enterprise & Sovereign AI
Microsoft, OpenAI, and SAP Partner for Germany’s Public Sector AI
- A collaboration between Microsoft, OpenAI, and SAP will deploy sovereign AI in Germany’s public sector by 2026, using Microsoft Azure and SAP’s Delos Cloud to ensure data privacy and compliance.
Anthropic’s Claude Integrated into Microsoft 365 Copilot
- Claude (Anthropic) is now available within Microsoft 365 Copilot, enhancing the platform’s AI capabilities.