AI's "Embarrassing" Math Moment: Why Hype is Hurting Real Progress
When an OpenAI researcher claimed GPT-5 solved 10 unsolved math problems, Google DeepMind's CEO called it 'embarrassing.' Here's the real story behind the AI hype and what it means for actual progress.
Google DeepMind CEO Demis Hassabis needed only three words: “This is embarrassing.” He was replying on X to a breathless post by Sébastien Bubeck, a researcher at rival firm OpenAI. Bubeck had announced that o3, OpenAI's latest large language model, had solved 10 unsolved problems in mathematics. “Science acceleration via AI has officially begun,” he crowed. This mid-October skirmish is a perfect example of what’s wrong with AI right now.
Not a Discovery, But a Damn Good Search
The controversy centered around a set of puzzles known as Erdős problems. When Bubeck celebrated o3’s supposed breakthrough, mathematician Thomas Bloom, who runs a website tracking these problems, quickly called him out. “This is a dramatic misrepresentation,” he wrote on X.
Bloom explained that just because his website doesn't list a solution doesn't mean a problem is unsolved; it just means he wasn't aware of one. While no single human has read all the millions of math papers out there, o3 probably has. It turned out that o3 hadn't generated new solutions, but had instead scoured the internet for 10 existing ones Bloom hadn't seen. It was an amazing feat of literature search, but the hype for “discovery” overshadowed what was already a cool achievement.
The Hype Machine: A Pattern of Overstatement
This isn't an isolated incident. In August, a study showed no LLM at the time could solve a puzzle known as Yu Tsumura’s 554th Problem. Two months later, social media erupted with claims that o3 now could, with one observer commenting the “Lee Sedol moment is coming for many,” referencing the Go master’s 2016 loss to DeepMind’s AlphaGo.
But François Charton, a research scientist at AI startup Axiom Math, brought some perspective. “It’s a question you would give an undergrad,” he said. This reality check extends beyond math. Recent studies on LLMs in medicine and law found they were flawed at recommending treatments and often gave inconsistent legal advice. The authors of the law study concluded, “Evidence thus far spectacularly fails to meet the burden of proof.” But that’s not a message that goes down well on X, where, as Charton notes, “nobody wants to be left behind.”
Enter Axiom: A Quieter Kind of Breakthrough
Amid the noise, a genuine breakthrough occurred. Axiom, a startup founded just a few months ago, announced its model, Alpha Erdos, had solved two genuinely open Erdős problems. Days later, it solved an impressive 12 more problems in the Putnam competition, a notoriously difficult college-level math challenge.
As some researchers pointed out, the Putnam competition tests knowledge more than creativity, making it a different kind of challenge than the IMO. Judging these achievements requires a deeper dive than a social media post allows. The real question is not just *what* these models solve, but *how* they do it.
The frantic pace of AI development, amplified by social media, is clashing with the slow, rigorous demands of scientific validation. This creates a dangerous credibility gap between what AI can actually do and what the public is led to believe. The industry's biggest challenge isn't just building smarter models, but building trust.
本内容由AI根据原文进行摘要和分析。我们力求准确,但可能存在错误,建议核实原文。
相关文章
用戶正濫用Google Gemini與OpenAI ChatGPT等AI工具,將女性照片惡意製成不雅的比基尼深偽圖像。本文深入探討Reddit上的具體案例、科技巨頭的回應以及AI倫理面臨的嚴峻挑戰。
OpenAI宣稱GPT-5解決數學難題,卻遭Google DeepMind執行長斥為「尷尬」。本文深入剖析這場AI社群媒體炒作事件的始末,探討在浮誇風氣下,如何辨別真正的技術進展。
OpenAI 推出「Your Year with ChatGPT」功能,為用戶生成個人化年度 AI 互動報告。報告包含訊息數、聊天次數等統計數據,並頒發獨特稱號。本文詳解功能亮點與查看方式。
為應對日益增長的野熊威脅,日本多地部署AI驅動的「B Alert」預警系統。本文解析其如何透過即時影像辨識與自動化通報,將預警時間縮短30分鐘以上,保障公共安全。