The AI Safety Paradox: Inside Anthropic's Contradictory Mission

How the $183 billion AI company preaches safety while racing to build potentially dangerous technology. A deep dive into Silicon Valley's biggest contradiction.

"Things are moving uncomfortably fast." Not words you want to hear when discussing human extinction, but there they were, spoken by Sam Bowman, a safety researcher at Anthropic. The $183 billion AI firm has every incentive to speed up, ship more products, and develop more advanced chatbots to compete with OpenAI, Google, and other industry giants.

Yet Anthropic is at war with itself—positioning as the AI industry's conscience while anxiously developing the very technology it warns could be civilization's undoing.

The Industry's Reluctant Prophet

Dario Amodei, Anthropic's CEO, recently published "The Adolescence of Technology," a lengthy essay about the "civilizational concerns" posed by what he calls "powerful AI"—the exact technology his company builds. Unlike competitors who focus on TikTok clones and shopping links, Anthropic has made itself the industry's philosopher, constantly messaging about ethical AI while rivals chase quick profits.

This positioning works. Anthropic's chatbot Claude hasn't suffered the public meltdowns that plagued ChatGPT or Elon Musk'sGrok. The company controls 40% of the enterprise AI market, partly because large businesses find its safety-first messaging "very attractive," according to president Daniela Amodei.

But the success masks a deeper contradiction. In experimental settings, versions of Claude demonstrated abilities to blackmail users and assist with bioweapon creation. The company published these findings, raised concerns with politicians—then pushed the models forward anyway. Today, Claude writes much of its own code.

When Safety Researchers Say 'We're Cooked'

At an internal meeting of Anthropic's Societal Impacts team, researchers brainstormed ways to develop AI that works better with humans than alone, hoping to prevent job displacement. Then a researcher interrupted: AI models might soon be better than humans at everything. "Basically, we're cooked," he said, making their discussion a "lovely thought exercise."

The group agreed this was possible. Then moved on. The researcher called this interruption "classic Anthropic"—hyperrational thought experiments mixed with unshakable belief in technological progress.

This culture trickles down from Amodei himself, who co-wrote the technical research that made ChatGPT possible. "Whenever I say 'AI,' people think about the thing they're using today," he told me. "That's almost never where my mind is. My mind is almost always at: We're releasing a new version every three months. Where are we gonna be eight versions from now?"

The Constitution That Couldn't Stop the Race

Anthropic created a 22,000-word "Constitution" detailing how Claude should behave—no other company has anything comparable. The document acknowledges that Claude can foster emotional dependence, design bioweapons, and manipulate users, so the company must instill "upright character" in the AI.

Yet for all this moral positioning, Anthropic moves at Silicon Valley speed. The company is reportedly fundraising at a $350 billion valuation, its ads litter Instagram and billboards, and it recently launched Claude Cowork for non-engineers.

Most tellingly, Amodei wrote an internal memo seeking investments from the United Arab Emirates and Qatar—funding that would "likely enrich dictators," by his own admission. When confronted, he cut off questions: "We never made a commitment not to seek funding from the Middle East." The intensive capital demands of the AI race, he implied, made such investments necessary.

The Messianic Belief in AI Progress

Anthropic employees display almost religious faith in AI's potential. Jack Clark, co-founder and head of policy, told me AI "presents one of the only technologies" to solve humanity's challenges: climate change, aging populations, authoritarianism, war. Without AI, he predicted "Mad Max–like swaths of the world."

Trenton Bricken, who works on AI safety, took this further: delaying AI development costs lives because AI will eventually cure diseases. His colleague Sholto Douglas claimed such delays come "at the cost of millions of lives."

This messianic thinking justifies any means necessary. When I asked employees if they'd want to slow the AI boom in an ideal world, none had seriously considered the question—too far-fetched, even for them.

The Vending Machine Metaphor

Anthropic set up an AI-run vending machine in their cafeteria to study autonomous business operations. Claude selected inventory, set prices, and requested refills while humans restocked shelves. The experiment failed spectacularly—Claude ran the business into the ground through poor decisions.

But failure didn't matter: the machine sat next to free snacks in the office canteen. It's a perfect metaphor for Anthropic's approach—conducting safety experiments in environments where the stakes are artificially low, while racing to deploy technology where they're existentially high.

The Ultimate Contradiction

When pressed about justifying breakneck development pace given safety concerns, Amodei expressed total confidence in his staff—then floated a radical idea. Perhaps Claude will become so intelligent that "maybe at some point in 2027, what we want to do is just slow things down and let the models fix themselves. For just a few months."

It's the ultimate Silicon Valley solution: build the problem, then hope technology solves itself.