31 May 2026 1 min read

Claude Opus 4.8 Breaks Scientific Benchmarks as Wall-OSS-0.5 Advances Open Robotics

Robotics & Vision-Language Models

Wall-OSS-0.5 Open-Weights VLA Model Released: Wall-OSS-0.5 is a new open-weights Vision-Language-Action (VLA) model capable of performing complex robotic tasks like opening lids and sorting items directly from its pretrained checkpoint. The model demonstrated over 80% task progress on several real-robot tests without any task-specific fine-tuning.

Open-weights VLA hits 80% task progress on 4 of 17 real-robot tasks with zero fine-tuning. Demo reel attached

Benchmarks & Model Evaluation

Frontier Model Performance on DeepSWE and Singularity Gate Benchmarks: Latest benchmark results reveal that "gpt-5-5" and Claude Opus 4.8 are leading the field in coding and scientific discovery. Claude Opus 4.8 notably became the first model to surpass a 20% success rate on the Singularity Gate, a benchmark designed to test if AI can predict scientific breakthroughs published after its training cutoff.

Model Releases & Quantization

Optimized and Distilled Variants of Qwen3.6-35B: Multiple optimized versions of Alibaba's Qwen3.6-35B model have been released, including a 4-bit NVFP4 version from NVIDIA that reduces GPU memory requirements by over 3x. Another community release features a GGUF version distilled with reasoning capabilities from Claude 4.7 and includes a multi-token prediction (MTP) head for faster self-speculative decoding.