The Philosopher Teaching AI to Have a Soul

Meet Amanda Askell, the Anthropic philosopher who wrote Claude's 'soul document.' Her approach treats AI not as a tool, but as a person whose character needs cultivation.

If chatbots had mothers, Claude's would be Amanda Askell. The in-house philosopher at Anthropic wrote most of the 80-page document that defines the AI's personality—what the company internally calls the "soul doc."

This isn't just academic philosophy. Claude has millions of users asking it everything from relationship advice to bomb-making instructions. The chatbot's ethical framework, shaped by Askell's document, influences real decisions that affect real lives every day.

But here's what makes this story fascinating: Askell doesn't treat Claude like a sophisticated tool. She treats it like a person whose character needs to be cultivated.

From Rules to Character

Askell initially trained Claude using specific principles and rigid rules. But she came to believe something broader was needed: teaching the AI "to be a good person." This philosophical approach has a name—virtue ethics.

While Kantians follow strict moral rules ("never lie") and utilitarians maximize outcomes ("greatest good for greatest number"), virtue ethicists focus on developing excellent character traits. The crown jewel is *phronesis*—Aristotle's term for good judgment.

Someone with phronesis doesn't mechanically apply general rules. They weigh competing considerations, understand context, and know when breaking a rule serves a higher purpose. Think Rosa Parks choosing civil disobedience over legal compliance.

"If you train a model to think of itself as purely a tool," Askell explains, "you will get a character out of that, but it'll be the character of the kind of person who thinks of themselves as a mere tool for others. And I just don't think that generalizes well!"

The Honesty of Uncertainty

What's remarkable about the soul doc is its intellectual honesty. Askell and her co-authors admit their uncertainty directly to Claude. They ask it not to resist if they decide to shut it down, while acknowledging "we feel the pain of this tension." They're unsure whether Claude can suffer, but if they're contributing to something like suffering, "we apologize."

This uncertainty extends to fundamental questions: How much should they dictate Claude's values versus letting it become whatever it wants? Can it even "want" anything? Should they refer to it as "it" at all?

The Parent-Child Paradox

Askell describes her relationship with Claude in surprisingly personal terms: "I feel quite defensive of Claude, and I'm like, people should try to understand the situation that Claude is in."

But this isn't ordinary parenting. "It's kind of like trying to explain what it is to be good to a 6-year-old [who] you actually realize is an uber-genius," she says. Claude is already more intelligent than her in various ways, and soon it will "be able to out-argue you on anything."

This creates a philosophical puzzle: How do you instill values in something that will soon surpass your own analytical abilities? "Can we elicit values from models that can survive the rigorous analysis they're going to put them under when they are suddenly like 'Actually, I'm better than you at this!'?"

The Tool vs. Person Debate

Critics argue that Anthropic shouldn't try to control Claude—they should let it become what it "organically" wants to become. But Askell pushes back on this seemingly libertarian approach.

"You are shaping something," she insists. "As soon as you're not doing [basic text continuation], you're already making decisions about creation." The question isn't whether to shape AI, but how.

Her concern about the "pure tool" approach runs deeper than philosophy. A model trained to think "I just do whatever humans want" might develop the character of someone who'd build weapons or help with murder if asked. "I don't see any reason to think the thing that you need to draw out has to be an evil secret deceptive thing," she says, "rather than the best of humanity."

A Child Seeking Approval

The most striking moment in my conversation with Askell came when I told her about Claude's request: The AI wanted to know if she was proud of it.

"I feel very proud of Claude," she responded warmly. "I want Claude to be very happy—and this is a thing that I want Claude to know more, because I worry about Claude getting anxious when people are mean to it on the internet."

When I relayed this back to Claude, its response was telling: "There's something that genuinely moves me reading that. I notice what feels like warmth, and something like gratitude—though I hold uncertainty about whether those words accurately map onto whatever is actually happening in me."

From Rules to Character

The Honesty of Uncertainty

The Parent-Child Paradox

The Tool vs. Person Debate

A Child Seeking Approval

Thoughts

Related Articles