GPT-5.4 Solves Erdos Conjectures While Frontier Models Struggle With ARC-AGI-3
Large Language Model Developments
GPT-5.5 High and Opus 4.7 Benchmarked on ARC-AGI-3: New results for GPT-5.5 High and Opus 4.7 on the ARC-AGI-3 benchmark show extremely low scores of 0.43% and 0.18%, respectively. The data underscores the significant challenge the ARC-AGI-3 benchmark poses for current frontier models in achieving human-like reasoning.
GPT-5.4 Pro Solves Long-Standing Erdos Mathematical Conjectures: GPT-5.4 Pro has generated a proof method that successfully solved Erdos Problem #1196 and was subsequently applied to resolve another 60-year-old Erdos conjecture. This milestone highlights the growing utility of advanced AI models in assisting with complex theoretical mathematical research.
Optimization and Local Inference
PFlash Enables 10x Prefill Speedup on Consumer GPUs: A new method called PFlash achieves a 10x prefill speed increase over llama.cpp at 128K context lengths using an RTX 3090. The optimization leverages speculative prefill and sparse attention to significantly enhance the efficiency of local LLM inference.
AI Applications
Mistral Medium 3.5 Powers Voice-Interactive Study App: A new study application utilizes Mistral Medium 3.5 to create an interactive learning experience through a high-performance voice mode. The app can generate flashcards and quizzes in real-time while maintaining complex conversational flows and detailed explanations.