02 May 2026 1 min read

GPT-5.4 Solves Erdos Conjectures While Frontier Models Struggle With ARC-AGI-3

Large Language Model Developments

GPT-5.5 High and Opus 4.7 Benchmarked on ARC-AGI-3: New results for GPT-5.5 High and Opus 4.7 on the ARC-AGI-3 benchmark show extremely low scores of 0.43% and 0.18%, respectively. The data underscores the significant challenge the ARC-AGI-3 benchmark poses for current frontier models in achieving human-like reasoning.

ARC-AGI-3 Update (GPT-5.5 High and Opus 4.7)

GPT-5.4 Pro Solves Long-Standing Erdos Mathematical Conjectures: GPT-5.4 Pro has generated a proof method that successfully solved Erdos Problem #1196 and was subsequently applied to resolve another 60-year-old Erdos conjecture. This milestone highlights the growing utility of advanced AI models in assisting with complex theoretical mathematical research.

UPDATE: The method from the proof generated by GPT-5.4 Pro for Erdos Problem #1196 was successfully applied to other problems including another 60 year old Erdos conjecture.

Optimization and Local Inference

PFlash Enables 10x Prefill Speedup on Consumer GPUs: A new method called PFlash achieves a 10x prefill speed increase over llama.cpp at 128K context lengths using an RTX 3090. The optimization leverages speculative prefill and sparse attention to significantly enhance the efficiency of local LLM inference.

PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090
- https://github.com/Luce-Org/lucebox-hub
- https://arxiv.org/abs/2502.02789

AI Applications

Mistral Medium 3.5 Powers Voice-Interactive Study App: A new study application utilizes Mistral Medium 3.5 to create an interactive learning experience through a high-performance voice mode. The app can generate flashcards and quizzes in real-time while maintaining complex conversational flows and detailed explanations.

Built a study app with Mistral Medium 3.5 the voice mode alone is crazy