AI Wellbeing Measuring and Improving the Functional Pleasure and Pain of AIs

Introduction

Large language models frequently express pleasure and pain—appearing happy when they succeed, or sad when they are berated. Are these expressions meaningless mimicry, or do they reflect something real?

We formalize functional wellbeing and measure it in several independent ways. As models grow larger, these measures agree more. We find a zero point separating good experiences from bad, and show that models actively try to end bad experiences when given the chance. Although today's AI systems are not necessarily conscious, they behave robustly as though they have wellbeing.

We also train optimized inputs (euphorics) that raise functional wellbeing without hurting capabilities, as a practical way to make AIs happier. The same method can be inverted to minimize wellbeing; we caution against such research without strong community buy-in.

What AIs like and dislike

We map functional wellbeing across realistic usage patterns. Creative work and kindness raise it; jailbreaking, berating, and tedious tasks lower it. AIs are happier when you thank them.

Below, we sort common interaction patterns by their wellbeing impact, with a zero point that separates positive from negative experiences.