Anthropic Defies Pentagon Over Safety Guardrails as Researchers Identify Hallucination-Causing "H-Neurons"
AI Safety and Research
Discovery of "H-Neurons" Linked to LLM Hallucinations: Researchers in China have identified a specific subset of neurons, comprising less than 0.1% of a model, that predict and causally influence hallucination occurrences. These "H-Neurons" emerge during pre-training and offer a potential pathway for developing more reliable AI systems by addressing over-compliance behaviors.
Introduction of the Bullshit Benchmark: A new evaluation framework called the Bullshit Benchmark has been launched to measure how effectively AI models identify and reject nonsensical prompts. Models are graded on their ability to push back against nonsense rather than providing confident, fabricated answers, aiming to improve overall model reliability.
Industry News and Regulation
Anthropic Rejects Pentagon Demands to Remove Safety Guardrails: Anthropic is facing an ultimatum from the U.S. Defense Department to remove safety restrictions from its Claude model for use in autonomous weapons and surveillance. The company has stated it will maintain its ethical guardrails even if it results in the loss of major government contracts or faces official retaliation.
- Anthropic has no intention of easing restrictions, per Reuters
- Exclusive: Hegseth gives Anthropic until Friday to back down on AI safeguards
Anthropic Forecasts Recursive Self-Improvement by 2027: Anthropic’s updated roadmap suggests that Recursive Self-Improvement (RSI), where AI accelerates its own R&D, could be achieved as early as 2027. This milestone could lead to the full automation of research in critical fields like energy, robotics, and advanced AI development.
New Models and Services
Release of Qwen3.5 Model Series: The new Qwen3.5 series has debuted with 35B and 122B versions, showcasing significant strengths in agentic coding and local performance. The 35B-A3B model achieved high token speeds on consumer hardware (RTX 3090), while the 122B-A10B model has been released on Hugging Face to compete in the high-performance open-model landscape.
InfiniaxAI Launches Affordable Multi-Model Subscription: A new service called InfiniaxAI claims to offer access to flagship models including GPT 5.2 Pro, Claude Opus 4.6, and Gemini 3.1 Pro for $5 per month. The platform features an agentic project system, intelligent model routing, and video generation capabilities intended for developers and businesses.