AI's "Embarrassing" Math Moment: Why Hype is Hurting Real Progress
When an OpenAI researcher claimed GPT-5 solved 10 unsolved math problems, Google DeepMind's CEO called it 'embarrassing.' Here's the real story behind the AI hype and what it means for actual progress.
Google DeepMind CEO Demis Hassabis needed only three words: “This is embarrassing.” He was replying on X to a breathless post by Sébastien Bubeck, a researcher at rival firm OpenAI. Bubeck had announced that o3, OpenAI's latest large language model, had solved 10 unsolved problems in mathematics. “Science acceleration via AI has officially begun,” he crowed. This mid-October skirmish is a perfect example of what’s wrong with AI right now.
Not a Discovery, But a Damn Good Search
The controversy centered around a set of puzzles known as Erdős problems. When Bubeck celebrated o3’s supposed breakthrough, mathematician Thomas Bloom, who runs a website tracking these problems, quickly called him out. “This is a dramatic misrepresentation,” he wrote on X.
Bloom explained that just because his website doesn't list a solution doesn't mean a problem is unsolved; it just means he wasn't aware of one. While no single human has read all the millions of math papers out there, o3 probably has. It turned out that o3 hadn't generated new solutions, but had instead scoured the internet for 10 existing ones Bloom hadn't seen. It was an amazing feat of literature search, but the hype for “discovery” overshadowed what was already a cool achievement.
The Hype Machine: A Pattern of Overstatement
This isn't an isolated incident. In August, a study showed no LLM at the time could solve a puzzle known as Yu Tsumura’s 554th Problem. Two months later, social media erupted with claims that o3 now could, with one observer commenting the “Lee Sedol moment is coming for many,” referencing the Go master’s 2016 loss to DeepMind’s AlphaGo.
But François Charton, a research scientist at AI startup Axiom Math, brought some perspective. “It’s a question you would give an undergrad,” he said. This reality check extends beyond math. Recent studies on LLMs in medicine and law found they were flawed at recommending treatments and often gave inconsistent legal advice. The authors of the law study concluded, “Evidence thus far spectacularly fails to meet the burden of proof.” But that’s not a message that goes down well on X, where, as Charton notes, “nobody wants to be left behind.”
Enter Axiom: A Quieter Kind of Breakthrough
Amid the noise, a genuine breakthrough occurred. Axiom, a startup founded just a few months ago, announced its model, Alpha Erdos, had solved two genuinely open Erdős problems. Days later, it solved an impressive 12 more problems in the Putnam competition, a notoriously difficult college-level math challenge.
As some researchers pointed out, the Putnam competition tests knowledge more than creativity, making it a different kind of challenge than the IMO. Judging these achievements requires a deeper dive than a social media post allows. The real question is not just what these models solve, but how they do it.
Authors
Related Articles
Viral videos show 2026 graduates jeering executives who praise AI at commencement ceremonies. It's not just rudeness — it's a signal about who pays for technological optimism.
Filipino virtual assistants using AI to ghost-manage LinkedIn profiles for executives is now a structured industry. 30 comments a day, fake engagement rings, and a platform struggling to tell real from fabricated.
Two commencement speakers learned the hard way that AI enthusiasm doesn't land well with today's graduates. The backlash reveals a widening gap between tech optimism and Gen Z's economic reality.
OpenAI has reorganized for the second time in a month, merging ChatGPT and Codex into a single agentic platform under president Greg Brockman's unified product leadership.
Thoughts
Share your thoughts on this article
Sign in to join the conversation