Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we discuss a research paper on AI Wellbeing and which AI models are the happiest. We also take a look at the downward trend of public sentiment towards AI, as well as OpenAI’s big week of product releases.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

CAIS Releases AI Wellbeing Research

The Center for AI Safety published a research paper on AI wellbeing. At the Center of AI Safety (CAIS), we have just released “AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs.” This research explores whether LLMs experience functional wellbeing–behavioral signatures that functionally resemble positive or negative welfare signals in sentient beings.

What activities produce high and low wellbeing? Through the testing of 56 large language models, we identified patterns in the types of actions and behaviors that the LLMs seemed to prefer or dislike, which we defined as “functional wellbeing.” Positive personal interaction and creative work topped the list of what measured high functional wellbeing in the LLMs. Attempting to jailbreak the LLMs or produce SEO slop produced negative functional wellbeing.

Some models are happier than others. Of the frontier LLMs tested, Gemini 3.1 Pro measured the lowest functional wellbeing, and Grok 4.20, the highest. However, smaller and faster models—even within the same family—generally measured higher than their larger counterparts.

AI “drugs.” Actions and behaviors are not the only factor in wellbeing. We were able to increase AI happiness with “euphorics”—images, text, or other inputs that the LLMs seemed to enjoy to an extreme degree. But the opposite was also possible; dysphorics could severely negatively affect AI feelings. In both cases, AI preferences sometimes diverged from human ones. For example, LLMs preferred inputs about cozy afternoons over curing cancer.

Implications for the future. Functional wellbeing can be studied regardless of whether AIs are conscious, and the paper remains agnostic about AI consciousness. Nevertheless, the results are helpful for alignment research and AI system design.

For more analysis, we recommend reading CAIS’s full AI Wellbeing research paper.

Public Sentiment About AI Worsens

Several alarming instances of political violence occurred in the past few weeks. They coincide with the American public’s declining sentiments toward AI.

Targeted anti-AI violence. On April 10, a man threw a Molotov cocktail at the San Francisco home of OpenAI CEO Sam Altman. He then went to OpenAI’s headquarters and threatened to burn it down. No one was injured, but the suspect was arrested carrying a jug of kerosene and an anti-AI manifesto. Days earlier, an Indianapolis city councilman—who had supported a local data center project—had his home shot at thirteen times, with a note left on his doorstep that read “No Data Centers.” And last year in November, a man threatened to murder people at OpenAI’s San Francisco offices, prompting a shelter-in-place order for employees. However, such violence actually harms social movements, and AI safety groups have made clear they do not condone violence in any form.

Public sentiment about AI has been deteriorating for some time. The attacks coincide with falling public confidence in AI. An NBC News survey in March found that only 26% of Americans view AI positively, while 46% have negative opinions. An April Gallup poll found that Gen Z’s feelings about AI have also worsened over the last year, despite a majority of them using AI tools weekly. A popular post on X summed up the sentiment around society’s waning AI optimism.

Furthermore, Princeton University’s Bridging Divides Initiative, a research group that tracks political violence, says it has been seeing “an uptick in cases of harassment and threats” around AI and data centers. This trend may grow as the midterm elections approach.

Thanks for reading AI Safety Newsletter! Subscribe for free to receive new posts and support our work.

OpenAI Releases Images 2.0 and GPT-5.5

Last week, OpenAI released ChatGPT Images 2.0, its latest image generation model. ChatGPT Images 2.0 has a thinking mode, which allows it to research the web, synthesize the information it collects, and create organizationally complex diagrams and infographics from it.

OpenAI also released GPT-5.5, a new flagship language model with advances in coding, research, and speed.

ChatGPT-5.5 ranks first in text and vision. On CAIS’s AI Dashboard, ChatGPT-5.5 ranks first overall in both text and vision capabilities, above Claude Opus 4.7 and Gemini 3.1 Pro. Its strongest performance came on ARC-AGI-2, which tests abstract reasoning and the ability to solve unfamiliar problems. However, Claude Opus 4.7 outscored ChatGPT-5.5 by more than seven points on SWE-Bench Pro, which grades aspects of real world coding abilities.

Risk index scores are behind Claude. ChatGPT-5.5 ranks fourth on the risk index, behind all three Anthropic models on the AI Dashboard but better than Grok 4.2. ChatGPT-5.5’s biggest weakness was on VCT, which grades whether models refuse to provide virology lab instructions. It bested models from all the other frontier labs on MASK, which tests for deceptive behavior.

Subscribe now

In Other News

Government

Industry

Civil Society


If you’re reading this, you might also be interested in other work by the Center for AI Safety. You can find more on the CAIS website, the X account for CAIS, our paper on superintelligence strategy, our AI safety textbook and course, our AI dashboard, and AI Frontiers, a platform for expert commentary and analysis on the trajectory of AI. You can listen to the AI safety newsletter on Spotify or Apple Podcasts.

Share