Amazon's AI Coding Tools Helped Break Its Own Website
Amazon's site crashed for six hours last week. Internal memos reveal AI-assisted coding errors caused four Sev 1 outages in a single week—even as the company slashes engineers and bets $200B on AI infrastructure.
For roughly six hours last Thursday, millions of Amazon shoppers couldn't check out, couldn't see prices, and couldn't access their accounts. The official explanation: a "software code deployment" issue. The internal explanation, buried in memos now circulating in newsrooms, is considerably more awkward.
Four Outages. One Week. One Common Thread.
Dave Treadwell, Amazon's Senior Vice President of eCommerce Foundation, sent a note to employees that CNBC obtained and confirmed. The message was blunt: "Folks—as you likely know, the availability of the site and related infrastructure has not been good recently."
Not good is an understatement. Amazon logged four Sev 1 incidents—the highest severity classification, reserved for outages or severe degradation of critical systems—within a single week. Treadwell said the situation required a "deep dive" to "regain our strong availability posture."
A separate memo from Treadwell identified the common thread running through recent incidents dating back to Q3 2025: "GenAI-assisted changes." More specifically, the memo describes "GenAI tools supplementing or accelerating production change instructions, leading to unsafe practices." He also acknowledged, plainly, that "best practices and safeguards" around generative AI usage "haven't been fully established yet."
That last sentence deserves a moment.
The Math Behind the Mess
To understand how this happened, you need to look at two numbers moving in opposite directions at Amazon.
On the investment side: Amazon has committed to $200 billion in capital expenditures in 2025—more than any other tech company on the planet. The bulk of that is AI infrastructure. The company is building fast, spending big, and racing to stay ahead in the cloud and AI arms race.
On the headcount side: Amazon laid off approximately 16,000 corporate employees in January 2025, following a cut of roughly 14,000 in October 2024. Between 2022 and 2023, another 27,000+ roles were eliminated. That's more than 57,000 positions gone in under three years.
The tension between those two trends is precisely what Treadwell's memos describe. With fewer senior engineers in the room, junior staffers have been using AI coding tools to push changes into production environments. The short-term fix Amazon announced makes the dynamic explicit: going forward, "GenAI-assisted" production changes made by lower-level staff must be reviewed by more senior engineers.
In other words, the safeguard for AI-generated code is... more humans.
Not Just Retail
Amazon Web Services has had its own rough patch. In December, reports surfaced that an extended outage of a cost management feature followed changes made by Kiro, Amazon's internal AI coding tool. Amazon called it "user error, not AI" at the time. Read alongside Treadwell's latest memos, that framing looks increasingly thin.
Amazon said Tuesday that AWS is not involved in the incidents referenced by Treadwell—the retail and cloud divisions are treating these as separate problems. But for customers and investors, the distinction may feel academic when both sides of the business are logging high-severity failures in the same quarter.
The company's planned response includes what Treadwell called "controlled friction"—deliberately slowing down changes to the most critical parts of the retail experience—while investing in longer-term "deterministic and agentic safeguards." The word temporary appears in the memo. The word permanent does not.
What the Industry Is Watching
Amazon is not alone in this predicament. Across the industry, companies are deploying AI coding assistants—GitHub Copilot, Cursor, Devin, and others—to accelerate development cycles. The productivity gains are real. So, apparently, are the risks when those tools operate in production environments without adequate review layers.
For cloud engineers and AI developers, this is the canary in the coal mine. The question isn't whether AI can write code. It's whether organizations are building the oversight infrastructure fast enough to match the speed at which AI is generating it.
For investors, the outages add a wrinkle to an otherwise bullish Amazon story. The stock has been called "cheap" by some analysts, and the long-term AI infrastructure bet remains intact. But repeated availability failures on the core retail platform—the cash engine that funds everything else—are not a rounding error.
For Amazon customers, the six-hour lockout was an inconvenience. The next one could be longer.
This content is AI-generated based on source articles. While we strive for accuracy, errors may occur. We recommend verifying with the original source.
Related Articles
Amazon's 6-hour outage affected 22,000+ users globally, exposing vulnerabilities in our digital infrastructure dependency. What happens when convenience meets fragility?
George Washington University sold its Virginia campus to Amazon for $427 million. As AI transforms real estate values, land near fiber and power beats proximity to lecture halls.
Amazon's massive $50 billion OpenAI investment creates new AI alliance, challenging Microsoft's dominance while accelerating cloud wars and custom chip competition
Amazon's potential $50 billion investment in OpenAI reportedly hinges on an IPO or achieving AGI. What this conditional approach reveals about Big Tech's AI investment strategy.
Thoughts
Share your thoughts on this article
Sign in to join the conversation