Constitutional AI
Also: CAI
Anthropic's method for training Claude to be helpful, harmless, and honest. Instead of having people manually label every AI response as good or bad, Anthropic gave the model a set of written principles — a 'constitution' — and trained it to evaluate and improve its own responses according to those principles. It's a big reason why Claude is generally willing to explain its reasoning, push back on harmful requests, and behave consistently rather than unpredictably.
In practice
Claude doesn't just avoid harm because it was told to avoid specific bad outputs. It was trained on a set of principles — a constitution — and learned to evaluate and improve its own responses against those principles. That's why Claude's behavior is relatively consistent and principled rather than arbitrary, and why it can explain its reasoning when it declines something.