AI's "Embarrassing" Math Moment: Why Hype is Hurting Real Progress
When an OpenAI researcher claimed GPT-5 solved 10 unsolved math problems, Google DeepMind's CEO called it 'embarrassing.' Here's the real story behind the AI hype and what it means for actual progress.
Google DeepMind CEO Demis Hassabis needed only three words: “This is embarrassing.” He was replying on X to a breathless post by Sébastien Bubeck, a researcher at rival firm OpenAI. Bubeck had announced that o3, OpenAI's latest large language model, had solved 10 unsolved problems in mathematics. “Science acceleration via AI has officially begun,” he crowed. This mid-October skirmish is a perfect example of what’s wrong with AI right now.
Not a Discovery, But a Damn Good Search
The controversy centered around a set of puzzles known as Erdős problems. When Bubeck celebrated o3’s supposed breakthrough, mathematician Thomas Bloom, who runs a website tracking these problems, quickly called him out. “This is a dramatic misrepresentation,” he wrote on X.
Bloom explained that just because his website doesn't list a solution doesn't mean a problem is unsolved; it just means he wasn't aware of one. While no single human has read all the millions of math papers out there, o3 probably has. It turned out that o3 hadn't generated new solutions, but had instead scoured the internet for 10 existing ones Bloom hadn't seen. It was an amazing feat of literature search, but the hype for “discovery” overshadowed what was already a cool achievement.
The Hype Machine: A Pattern of Overstatement
This isn't an isolated incident. In August, a study showed no LLM at the time could solve a puzzle known as Yu Tsumura’s 554th Problem. Two months later, social media erupted with claims that o3 now could, with one observer commenting the “Lee Sedol moment is coming for many,” referencing the Go master’s 2016 loss to DeepMind’s AlphaGo.
But François Charton, a research scientist at AI startup Axiom Math, brought some perspective. “It’s a question you would give an undergrad,” he said. This reality check extends beyond math. Recent studies on LLMs in medicine and law found they were flawed at recommending treatments and often gave inconsistent legal advice. The authors of the law study concluded, “Evidence thus far spectacularly fails to meet the burden of proof.” But that’s not a message that goes down well on X, where, as Charton notes, “nobody wants to be left behind.”
Enter Axiom: A Quieter Kind of Breakthrough
Amid the noise, a genuine breakthrough occurred. Axiom, a startup founded just a few months ago, announced its model, Alpha Erdos, had solved two genuinely open Erdős problems. Days later, it solved an impressive 12 more problems in the Putnam competition, a notoriously difficult college-level math challenge.
As some researchers pointed out, the Putnam competition tests knowledge more than creativity, making it a different kind of challenge than the IMO. Judging these achievements requires a deeper dive than a social media post allows. The real question is not just *what* these models solve, but *how* they do it.
The frantic pace of AI development, amplified by social media, is clashing with the slow, rigorous demands of scientific validation. This creates a dangerous credibility gap between what AI can actually do and what the public is led to believe. The industry's biggest challenge isn't just building smarter models, but building trust.
本コンテンツはAIが原文記事を基に要約・分析したものです。正確性に努めていますが、誤りがある可能性があります。原文の確認をお勧めします。
関連記事
グーグルGeminiやOpenAIのChatGPTといった生成AIを悪用し、同意なく女性の写真をビキニ姿のディープフェイクに加工する問題が深刻化。レディットでの事例や各社の対応、そして技術倫理の課題を解説します。
Google DeepMindのCEOが「恥ずかしい」と評した、OpenAIのGPT-5による「数学の未解決問題解決」騒動。AI業界の誇大広告(ハイプ)の実態と、真の技術的進歩を見極めるための視点を解説します。
OpenAIが、ユーザーの1年間のChatGPT利用状況を可視化する新機能「Your Year with ChatGPT」を発表。統計データやユニークな「称号」、AI生成の詩で2025年を振り返ります。利用方法や対象地域を解説。
日本ではクマの出没が社会問題化する中、AIを活用した早期警戒システム「Bアラート」が導入されています。リアルタイム検知と自動通知で、住民の安全をどう守るのか、その仕組みと背景を分かりやすく解説します。