25 Feb 2026 2 min read

Anthropic Defies Pentagon Over Safety Guardrails as Researchers Identify Hallucination-Causing "H-Neurons"

AI Safety and Research

Discovery of "H-Neurons" Linked to LLM Hallucinations: Researchers in China have identified a specific subset of neurons, comprising less than 0.1% of a model, that predict and causally influence hallucination occurrences. These "H-Neurons" emerge during pre-training and offer a potential pathway for developing more reliable AI systems by addressing over-compliance behaviors.

Chinese researchers have found the cause of hallucinations in LLMs
- https://arxiv.org/abs/2512.01797

Introduction of the Bullshit Benchmark: A new evaluation framework called the Bullshit Benchmark has been launched to measure how effectively AI models identify and reject nonsensical prompts. Models are graded on their ability to push back against nonsense rather than providing confident, fabricated answers, aiming to improve overall model reliability.

Bullshit Benchmark - A benchmark for testing whether models identify and push back on nonsensical prompts instead of confidently answering them
- https://x.com/scaling01/status/2026398199993258428?s=46

Industry News and Regulation

Anthropic Rejects Pentagon Demands to Remove Safety Guardrails: Anthropic is facing an ultimatum from the U.S. Defense Department to remove safety restrictions from its Claude model for use in autonomous weapons and surveillance. The company has stated it will maintain its ethical guardrails even if it results in the loss of major government contracts or faces official retaliation.

Anthropic Forecasts Recursive Self-Improvement by 2027: Anthropic’s updated roadmap suggests that Recursive Self-Improvement (RSI), where AI accelerates its own R&D, could be achieved as early as 2027. This milestone could lead to the full automation of research in critical fields like energy, robotics, and advanced AI development.

Anthropic believes RSI (recursive self improvement) could arrive “as soon as early 2027”
- https://www.anthropic.com/responsible-scaling-policy/roadmap

New Models and Services

Release of Qwen3.5 Model Series: The new Qwen3.5 series has debuted with 35B and 122B versions, showcasing significant strengths in agentic coding and local performance. The 35B-A3B model achieved high token speeds on consumer hardware (RTX 3090), while the 122B-A10B model has been released on Hugging Face to compete in the high-performance open-model landscape.

InfiniaxAI Launches Affordable Multi-Model Subscription: A new service called InfiniaxAI claims to offer access to flagship models including GPT 5.2 Pro, Claude Opus 4.6, and Gemini 3.1 Pro for $5 per month. The platform features an agentic project system, intelligent model routing, and video generation capabilities intended for developers and businesses.

GPT 5.2 Pro + Claude Opus 4.6 + Gemini 3.1 Pro For Just $5/Month (With API Access & Agents)
- https://infiniax.ai
- https://www.youtube.com/watch?v=Ed-zKoKYdYM