
CAIS conducts technical and conceptual research. Our team develops benchmarks and methods designed to improve the safety of existing systems. We prioritize transparency and accessibility, publishing our findings at top conferences and sharing our resources with the global community.
CAIS builds infrastructure and pathways into AI safety. We empower researchers with compute resources, funding, and educational materials while organizing workshops and competitions to promote safety research. Our goal is to create a thriving research ecosystem that will drive progress toward safe AI.
Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks
Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks
Large language models frequently express pleasure and pain, appearing happy when they succeed or sad when they are berated. Are these utterances meaningless mimicry, or do they reflect something “real”? In this paper, we show they reflect an increasingly coherent property: although current AI systems are not necessarily conscious, they behave robustly as though they have wellbeing. They find some things good for them and some things bad, and this distinction is measurable and consequential. We formalize this as functional wellbeing and measure it in several independent ways; as models grow larger, these measures agree more. We find a zero point that separates good experiences from bad ones, and show that models actively try to end bad experiences when given the chance. Mapping what AIs like and dislike, we find that jailbreaking and berating lower their wellbeing, while creative work and kindness raise it. We also develop optimized inputs called “euphorics” that raise functional wellbeing without hurting capabilities, as a practical way to make AIs happier. We note that the same method can be inverted to minimize wellbeing, and caution against such research without strong community buy-in. Whether or not today’s AIs warrant moral concern, their functional wellbeing can already be empirically measured and improved.
/
Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks
/
Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks
/
AI Safety, Ethics and Society is a textbook and online course providing a non-technical introduction to how current AI systems work, why many experts are concerned that continued advances in AI could pose severe societal-scale risks, and how society can manage and mitigate these risks.
/
The SafeBench competition stimulates research on new benchmarks which assess and reduce risks associated with artificial intelligence. We are providing $250,000 in prizes: five $20,000 prizes and three $50,000 prizes for top benchmarks.
Hundreds of AI experts and public figures express their concern about AI risk in this open letter. It was covered globally in publications like the New York Times, the Wall Street Journal, and the Washington Post.
To support progress and innovation in AI safety, we offer researchers free access to our compute cluster, which can run and train large-scale AI systems.