Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we look at Anthropic’s release of its latest model, Fable 5, and the US government’s subsequent order to restrict it. We also discuss Anthropic’s recent call for the “option to slow or temporarily pause frontier AI development.”

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

The US Government Restricts Fable Days After its Release

On June 9, Anthropic released Claude Fable 5 to the public. The model is significantly more capable than previous releases; it is the highest-scoring model on the benchmark Humanity’s Last Exam, achieving 53.3% compared with Claude Opus 4.8’s score of 45.7%. Anthropic described Fable as having similar capabilities to Claude Mythos Preview—a model announced in April that the company deemed too good at finding cyber vulnerabilities to be safe for general release. Anthropic also made Mythos 5, a version of Fable without strict bio or cyber safeguards, available to a small number of trusted organizations.

Fable 5, Anthropic’s “Mythos-class” model with safeguards, was available for a few days before the US government ordered access restrictions due to national security concerns. Source.

The US government quickly ordered access restrictions. On June 12, Anthropic announced that the US government had “issued an export control directive” to restrict access to Fable for all foreign nationals—including those working in the US for Anthropic—for national security reasons. In practice, to comply with the order, Anthropic said it had to suspend access to Fable for all customers, including US citizens. The decision was reportedly prompted by warnings that Amazon researchers had found a jailbreak to bypass Fable’s safeguards and elicit dual-use cyber capabilities that the model is not supposed to provide.

Anthropic disagreed with the government’s order. In a statement, Anthropic acknowledged Fable is susceptible to jailbreaks. Anthropic added that it is currently likely impossible for any developer to make its AI models perfectly robust to jailbreaks, but that “Fable’s safeguards are substantially more effective than those of any previously deployed model.” However, Anthropic itself previously decided that Mythos 5, the version of the model without safeguards, posed too great a cyber risk to release publicly. If those safeguards can be easily jailbroken, then Fable could also present foreseeable national security risks of a magnitude that could prompt government intervention if posed by other technologies.

Governments are becoming increasingly concerned about AI capabilities. The week before Fable’s release, President Trump signed an executive order asking AI companies to provide new AI models to the US government 30 days before their general release. The EO shared responsibility for model testing among several national security organizations, including the NSA and CISA, rather than giving the Center for AI Standards and Innovation (CAISI) the central role. Reports suggested that this was the result of officials pushing for national security priorities on AI to lie under traditional national security agencies. Days after the EO, administration officials reportedly told CAISI to stop making its evaluations of AI models public. Now, the administration has taken a stronger measure, ordering an AI company to restrict access to one of its models for the first time. As AI systems become more powerful, such interventions will likely become more frequent.

If the government is willing to block AI models with cyberoffensive capabilities, it could also prohibit AI companies from engaging in other hazardous activities, such as fully automating the AI development process. Such actions may be particularly likely if public support for AI regulations remains strong.

Subscribe now

Anthropic Calls for Option to Slow AI Development

The week before Fable’s release, on June 4, Anthropic published a post titled “When AI builds itself.” The essay documents how AI is performing an increasing proportion of research tasks at Anthropic and is significantly accelerating progress. Pointing to Claude’s pace of improvement in coding, the company said “the evidence suggests that the human role is narrowing at each step in the AI development process.”

Anthropic’s post described how future AI agents might be able to “close the loop” and build their successors without human involvement. Source.

The essay outlined three possible futures. According to Anthropic, AI development will follow one of three paths: progress could plateau (although the company caveated that this scenario seems unlikely); AIs could continue to speed up AI development but remain under human oversight; or AIs could fully automate their own development. The third scenario could result in a self-reinforcing process that significantly accelerates progress and ultimately leads to superintelligence. Although companies recognize that this process entails a risk of losing control of AI models, they are nonetheless racing to fully automate research to outcompete each other.

Anthropic suggested it would be good if AI developers could collectively slow down. Acknowledging the risk of loss of control of AI models in the third scenario, Anthropic’s essay said “it would be good for the world to have the option to slow or temporarily pause frontier AI development.” This would allow time for AI safety research and for society to develop a strategy for managing the AI transformation. However, the company indicated that it would not unilaterally pause, saying that any slowdown would need to be coordinated worldwide to avoid giving the “least cautious” an opportunity to catch up.

Fable’s safeguards include limits on assistance with AI development. Anthropic has put guardrails in place to prevent Fable from helping with tasks relevant to frontier LLM development. While the company says these limitations are motivated by concerns about accelerated development, critics have suggested that it may be seeking to ensure that its own models do not help its competitors. Anthropic initially said that these guardrails would be “invisible,” meaning that a user would not be able to see when a development-related request was refused by Fable and directed to a less capable model. However, backlash from the AI community led the company to reverse its position.

In Other News

Government

Representatives Jay Obernolte and Lori Trahan released a draft of the Great American AI Act, including proposals for mandatory independent audits of frontier AI developers. Its federal preemption clause would target local laws on AI development but preserve local laws on AI deployment.
The Financial Times reported that the NSA is using Anthropic’s Mythos model to conduct cyberattacks.
The New York Post reported on evidence that China is fueling anti-data center sentiment in the US to slow down American AI progress.
In AI Frontiers, Peter W Singer argues that AI models will not be put in charge of nuclear weapons, and that policymakers should focus on more realistic risks of AI in warfare.

Industry

SpaceX, the parent company of xAI, went public on June 12 and reached a valuation of over $2.5 trillion. On the 16th, it exercised its option to purchase the AI company Anysphere, the creators of Cursor, for $60 billion.
Anthropic expanded Project Glasswing, extending Claude Mythos access to about 150 more organizations.
DeepSeek is projected to raise $7.4 billion in its first funding round.

Civil Society

An open letter signed by AI CEOs called for orders of synthetic DNA to be screened to prevent malicious actors from obtaining AI-designed bioweapons.
Researchers have used AI to develop a new type of vaccine, which they say could offer broad protection against many variants within a family of viruses.
Researchers published a scenario envisioning how AI development in the US and China could push Europe into irrelevance.
In AI Frontiers, Steven Veld argues that we should worry as much about AI enabling self-imposed surveillance as top-down government surveillance.
In AI Frontiers, Deric Cheng and Jacob Schaal lay out a roadmap for managing AI’s economic impacts in the near, medium, and long term.
Thanks for reading AI Safety Newsletter! Subscribe for free to receive new posts and support our work.

If you’re reading this, you might also be interested in other work by the Center for AI Safety. You can find more via the CAIS newsroom, the X account for CAIS, our new paper on AI deterrence, our AI safety textbook and course, our AI safety dashboard, and AI Frontiers, a platform for expert commentary and analysis on the trajectory of AI.