Claude Opus 4.8 Breaks Scientific Benchmarks as Wall-OSS-0.5 Advances Open Robotics
Robotics & Vision-Language Models
Wall-OSS-0.5 Open-Weights VLA Model Released: Wall-OSS-0.5 is a new open-weights Vision-Language-Action (VLA) model capable of performing complex robotic tasks like opening lids and sorting items directly from its pretrained checkpoint. The model demonstrated over 80% task progress on several real-robot tests without any task-specific fine-tuning.
Benchmarks & Model Evaluation
Frontier Model Performance on DeepSWE and Singularity Gate Benchmarks: Latest benchmark results reveal that "gpt-5-5" and Claude Opus 4.8 are leading the field in coding and scientific discovery. Claude Opus 4.8 notably became the first model to surpass a 20% success rate on the Singularity Gate, a benchmark designed to test if AI can predict scientific breakthroughs published after its training cutoff.
- DeepSWE Opus 4.8 results have been released.
- Opus 4.8 Leads the Singularity Gate: New Benchmark for AI predicting paradigm-breaking scientific discoveries after model traning cutoff
Model Releases & Quantization
Optimized and Distilled Variants of Qwen3.6-35B: Multiple optimized versions of Alibaba's Qwen3.6-35B model have been released, including a 4-bit NVFP4 version from NVIDIA that reduces GPU memory requirements by over 3x. Another community release features a GGUF version distilled with reasoning capabilities from Claude 4.7 and includes a multi-token prediction (MTP) head for faster self-speculative decoding.
- nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face
- mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF just released !
- https://huggingface.co/mudler/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-APEX-MTP-GGUF#qwen36-35b-a3b-claude-47-opus-reasoning-distilled--apex-mtp-gguf
- https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
- https://github.com/ggml-org/llama.cpp/pull/22673
- https://github.com/mudler/llama.cpp/tree/mtp-imatrix