Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we look at Anthropic’s release of its latest model, Fable 5, and the US government’s subsequent order to restrict it. We also discuss Anthropic’s recent call for the “option to slow or temporarily pause frontier AI development.”

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

The US Government Restricts Fable Days After its Release

On June 9, Anthropic released Claude Fable 5 to the public. The model is significantly more capable than previous releases; it is the highest-scoring model on the benchmark Humanity’s Last Exam, achieving 53.3% compared with Claude Opus 4.8’s score of 45.7%. Anthropic described Fable as having similar capabilities to Claude Mythos Preview—a model announced in April that the company deemed too good at finding cyber vulnerabilities to be safe for general release. Anthropic also made Mythos 5, a version of Fable without strict bio or cyber safeguards, available to a small number of trusted organizations.

Fable 5, Anthropic’s “Mythos-class” model with safeguards, was available for a few days before the US government ordered access restrictions due to national security concerns. Source.

The US government quickly ordered access restrictions. On June 12, Anthropic announced that the US government had “issued an export control directive” to restrict access to Fable for all foreign nationals—including those working in the US for Anthropic—for national security reasons. In practice, to comply with the order, Anthropic said it had to suspend access to Fable for all customers, including US citizens. The decision was reportedly prompted by warnings that Amazon researchers had found a jailbreak to bypass Fable’s safeguards and elicit dual-use cyber capabilities that the model is not supposed to provide.

Anthropic disagreed with the government’s order. In a statement, Anthropic acknowledged Fable is susceptible to jailbreaks. Anthropic added that it is currently likely impossible for any developer to make its AI models perfectly robust to jailbreaks, but that “Fable’s safeguards are substantially more effective than those of any previously deployed model.” However, Anthropic itself previously decided that Mythos 5, the version of the model without safeguards, posed too great a cyber risk to release publicly. If those safeguards can be easily jailbroken, then Fable could also present foreseeable national security risks of a magnitude that could prompt government intervention if posed by other technologies.

Governments are becoming increasingly concerned about AI capabilities. The week before Fable’s release, President Trump signed an executive order asking AI companies to provide new AI models to the US government 30 days before their general release. The EO shared responsibility for model testing among several national security organizations, including the NSA and CISA, rather than giving the Center for AI Standards and Innovation (CAISI) the central role. Reports suggested that this was the result of officials pushing for national security priorities on AI to lie under traditional national security agencies. Days after the EO, administration officials reportedly told CAISI to stop making its evaluations of AI models public. Now, the administration has taken a stronger measure, ordering an AI company to restrict access to one of its models for the first time. As AI systems become more powerful, such interventions will likely become more frequent.

If the government is willing to block AI models with cyberoffensive capabilities, it could also prohibit AI companies from engaging in other hazardous activities, such as fully automating the AI development process. Such actions may be particularly likely if public support for AI regulations remains strong.

Subscribe now

Anthropic Calls for Option to Slow AI Development

The week before Fable’s release, on June 4, Anthropic published a post titled “When AI builds itself.” The essay documents how AI is performing an increasing proportion of research tasks at Anthropic and is significantly accelerating progress. Pointing to Claude’s pace of improvement in coding, the company said “the evidence suggests that the human role is narrowing at each step in the AI development process.”

Anthropic’s post described how future AI agents might be able to “close the loop” and build their successors without human involvement. Source.

The essay outlined three possible futures. According to Anthropic, AI development will follow one of three paths: progress could plateau (although the company caveated that this scenario seems unlikely); AIs could continue to speed up AI development but remain under human oversight; or AIs could fully automate their own development. The third scenario could result in a self-reinforcing process that significantly accelerates progress and ultimately leads to superintelligence. Although companies recognize that this process entails a risk of losing control of AI models, they are nonetheless racing to fully automate research to outcompete each other.

Anthropic suggested it would be good if AI developers could collectively slow down. Acknowledging the risk of loss of control of AI models in the third scenario, Anthropic’s essay said “it would be good for the world to have the option to slow or temporarily pause frontier AI development.” This would allow time for AI safety research and for society to develop a strategy for managing the AI transformation. However, the company indicated that it would not unilaterally pause, saying that any slowdown would need to be coordinated worldwide to avoid giving the “least cautious” an opportunity to catch up.

Fable’s safeguards include limits on assistance with AI development. Anthropic has put guardrails in place to prevent Fable from helping with tasks relevant to frontier LLM development. While the company says these limitations are motivated by concerns about accelerated development, critics have suggested that it may be seeking to ensure that its own models do not help its competitors. Anthropic initially said that these guardrails would be “invisible,” meaning that a user would not be able to see when a development-related request was refused by Fable and directed to a less capable model. However, backlash from the AI community led the company to reverse its position.

In Other News

Government

Industry

Civil Society

If you’re reading this, you might also be interested in other work by the Center for AI Safety. You can find more via the CAIS newsroom, the X account for CAIS, our new paper on AI deterrence, our AI safety textbook and course, our AI safety dashboard, and AI Frontiers, a platform for expert commentary and analysis on the trajectory of AI.

Share