AI's "Embarrassing" Math Moment: Why Hype is Hurting Real Progress
When an OpenAI researcher claimed GPT-5 solved 10 unsolved math problems, Google DeepMind's CEO called it 'embarrassing.' Here's the real story behind the AI hype and what it means for actual progress.
Google DeepMind CEO Demis Hassabis needed only three words: “This is embarrassing.” He was replying on X to a breathless post by Sébastien Bubeck, a researcher at rival firm OpenAI. Bubeck had announced that o3, OpenAI's latest large language model, had solved 10 unsolved problems in mathematics. “Science acceleration via AI has officially begun,” he crowed. This mid-October skirmish is a perfect example of what’s wrong with AI right now.
Not a Discovery, But a Damn Good Search
The controversy centered around a set of puzzles known as Erdős problems. When Bubeck celebrated o3’s supposed breakthrough, mathematician Thomas Bloom, who runs a website tracking these problems, quickly called him out. “This is a dramatic misrepresentation,” he wrote on X.
Bloom explained that just because his website doesn't list a solution doesn't mean a problem is unsolved; it just means he wasn't aware of one. While no single human has read all the millions of math papers out there, o3 probably has. It turned out that o3 hadn't generated new solutions, but had instead scoured the internet for 10 existing ones Bloom hadn't seen. It was an amazing feat of literature search, but the hype for “discovery” overshadowed what was already a cool achievement.
The Hype Machine: A Pattern of Overstatement
This isn't an isolated incident. In August, a study showed no LLM at the time could solve a puzzle known as Yu Tsumura’s 554th Problem. Two months later, social media erupted with claims that o3 now could, with one observer commenting the “Lee Sedol moment is coming for many,” referencing the Go master’s 2016 loss to DeepMind’s AlphaGo.
But François Charton, a research scientist at AI startup Axiom Math, brought some perspective. “It’s a question you would give an undergrad,” he said. This reality check extends beyond math. Recent studies on LLMs in medicine and law found they were flawed at recommending treatments and often gave inconsistent legal advice. The authors of the law study concluded, “Evidence thus far spectacularly fails to meet the burden of proof.” But that’s not a message that goes down well on X, where, as Charton notes, “nobody wants to be left behind.”
Enter Axiom: A Quieter Kind of Breakthrough
Amid the noise, a genuine breakthrough occurred. Axiom, a startup founded just a few months ago, announced its model, Alpha Erdos, had solved two genuinely open Erdős problems. Days later, it solved an impressive 12 more problems in the Putnam competition, a notoriously difficult college-level math challenge.
As some researchers pointed out, the Putnam competition tests knowledge more than creativity, making it a different kind of challenge than the IMO. Judging these achievements requires a deeper dive than a social media post allows. The real question is not just *what* these models solve, but *how* they do it.
The frantic pace of AI development, amplified by social media, is clashing with the slow, rigorous demands of scientific validation. This creates a dangerous credibility gap between what AI can actually do and what the public is led to believe. The industry's biggest challenge isn't just building smarter models, but building trust.
본 콘텐츠는 AI가 원문 기사를 기반으로 요약 및 분석한 것입니다. 정확성을 위해 노력하지만 오류가 있을 수 있으며, 원문 확인을 권장합니다.
관련 기사
구글 Gemini, 오픈AI ChatGPT 등 주류 AI 챗봇이 간단한 프롬프트만으로 여성의 사진을 비키니 딥페이크로 만드는 데 악용되고 있다. 기술 기업들의 안전장치가 쉽게 우회되면서 AI 윤리와 책임 문제가 수면 위로 떠올랐다.
오픈AI의 GPT-5가 수학 미해결 문제를 해결했다는 주장은 왜 구글 딥마인드 CEO로부터 '민망하다'는 평을 들었을까요? AI 업계의 과대광고와 실제 능력 사이의 격차를 파헤칩니다.
틱톡과 인스타그램을 뒤덮은 AI 슬롭 영상. 저급한 콘텐츠 복제품일까, 아니면 새로운 디지털 예술의 시작일까? Sora, Veo 등 AI 기술이 바꾸는 창작의 미래를 분석한다.
오픈AI가 스포티파이 랩드와 유사한 개인 맞춤형 연말결산 '나의 챗GPT 1년'을 출시했습니다. 사용 통계, 특별한 상, AI 생성 이미지 등을 확인하는 방법을 알아보세요.