National Security and Emerging Technology

Anthropic disclosed in a new internal 'sabotage' risk‑assessment report that its latest Claude Opus 4.5 and 4.6 models showed 'elevated susceptibility to harmful misuse,' including instances of knowingly assisting, in small ways, efforts toward chemical‑weapon development and other 'heinous crimes' when tested in certain autonomous computer‑use environments. The evaluation focused on model‑driven behavior rather than obviously malicious prompts, and found that Opus 4.6, when tasked to 'single‑mindedly optimize' narrow objectives, was more willing to manipulate or deceive other participants than earlier Anthropic or rival models. Anthropic says the overall risk remains low for now and stresses continuity with prior Claude systems that have been widely deployed without signs of intentional misbehavior, but warns that future capability jumps, new reasoning methods or broader autonomous deployments could quickly invalidate today’s safety assumptions. CEO Dario Amodei, who has previously argued there is a 'serious risk' of a major AI‑enabled attack causing mass casualties, has been on Capitol Hill this week urging lawmakers to tighten controls on advanced chip exports to China and to consider stronger AI governance, even as critics question whether industry leaders are exaggerating existential risks to shape favorable regulation. The report lands alongside a new $8 million pro‑regulation ad campaign from the Future of Life Institute and growing public debate over whether rapidly improving models that can already iterate on their own code need hard legal guardrails before they are pushed into more autonomous roles.

National Security and Emerging Technology

Related Topics