AI's "Embarrassing" Math Moment: Why Hype is Hurting Real Progress

When an OpenAI researcher claimed GPT-5 solved 10 unsolved math problems, Google DeepMind's CEO called it 'embarrassing.' Here's the real story behind the AI hype and what it means for actual progress.

Google DeepMind CEO Demis Hassabis needed only three words: “This is embarrassing.” He was replying on X to a breathless post by Sébastien Bubeck, a researcher at rival firm OpenAI. Bubeck had announced that o3, OpenAI's latest large language model, had solved 10 unsolved problems in mathematics. “Science acceleration via AI has officially begun,” he crowed. This mid-October skirmish is a perfect example of what’s wrong with AI right now.

Not a Discovery, But a Damn Good Search

The controversy centered around a set of puzzles known as Erdős problems. When Bubeck celebrated o3’s supposed breakthrough, mathematician Thomas Bloom, who runs a website tracking these problems, quickly called him out. “This is a dramatic misrepresentation,” he wrote on X.

Bloom explained that just because his website doesn't list a solution doesn't mean a problem is unsolved; it just means he wasn't aware of one. While no single human has read all the millions of math papers out there, o3 probably has. It turned out that o3 hadn't generated new solutions, but had instead scoured the internet for 10 existing ones Bloom hadn't seen. It was an amazing feat of literature search, but the hype for “discovery” overshadowed what was already a cool achievement.

The Hype Machine: A Pattern of Overstatement

This isn't an isolated incident. In August, a study showed no LLM at the time could solve a puzzle known as Yu Tsumura’s 554th Problem. Two months later, social media erupted with claims that o3 now could, with one observer commenting the “Lee Sedol moment is coming for many,” referencing the Go master’s 2016 loss to DeepMind’s AlphaGo.

But François Charton, a research scientist at AI startup Axiom Math, brought some perspective. “It’s a question you would give an undergrad,” he said. This reality check extends beyond math. Recent studies on LLMs in medicine and law found they were flawed at recommending treatments and often gave inconsistent legal advice. The authors of the law study concluded, “Evidence thus far spectacularly fails to meet the burden of proof.” But that’s not a message that goes down well on X, where, as Charton notes, “nobody wants to be left behind.”

Enter Axiom: A Quieter Kind of Breakthrough

Amid the noise, a genuine breakthrough occurred. Axiom, a startup founded just a few months ago, announced its model, Alpha Erdos, had solved two genuinely open Erdős problems. Days later, it solved an impressive 12 more problems in the Putnam competition, a notoriously difficult college-level math challenge.

As some researchers pointed out, the Putnam competition tests knowledge more than creativity, making it a different kind of challenge than the IMO. Judging these achievements requires a deeper dive than a social media post allows. The real question is not just what these models solve, but how they do it.

Not a Discovery, But a Damn Good Search

The Hype Machine: A Pattern of Overstatement

Enter Axiom: A Quieter Kind of Breakthrough

Thoughts

Related Articles