OpenAI Admits a Core AI Security Flaw Is 'Unlikely to Ever Be Fully Solved'
OpenAI concedes that prompt injection, a core AI security flaw, is 'unlikely to ever be fully solved.' We analyze their new defense—an AI-powered attacker—and the expert consensus on the risks of agentic AI.
OpenAI has conceded that one of the most critical vulnerabilities in modern AI systems, known as <keyword>prompt injection</keyword>, is a risk that’s not going away anytime soon. In a Monday blog post, the company stated that these attacks—which manipulate AI agents into following malicious instructions hidden in web pages or emails—are "unlikely to ever be fully ‘solved,’” comparing them to the persistent nature of scams and social engineering. This admission raises urgent questions about how safely autonomous AI agents can truly operate on the open web.
A Widely Recognized, Unsolvable Problem
OpenAI isn't alone in this grim assessment. The U.K.’s National Cyber Security Centre (NCSC) warned earlier this month that <keyword>prompt injection</keyword> attacks "may never be totally mitigated." The government agency advised cyber professionals to focus on reducing the risk and impact rather than thinking the attacks can be "stopped." For its part, OpenAI views the issue as a "long-term AI security challenge," acknowledging that its "agent mode" in <keyword>ChatGPT Atlas</keyword> significantly "expands the security threat surface."
Fighting AI with AI: The Automated Attacker
The company’s answer to this Sisyphean task is to build its own highly advanced hacker: an "LLM-based automated attacker." This bot, trained using reinforcement learning, is designed to proactively discover novel attack strategies before they are exploited in the wild. According to OpenAI, this automated attacker has already uncovered sophisticated, long-horizon attack methods that were missed by human red teaming campaigns.
Its key advantage is its insider access. The bot can see the target AI’s internal reasoning, allowing it to rapidly test an attack, study the response, tweak the exploit, and try again. While rivals like Google and Anthropic also emphasize layered defenses and continuous testing, OpenAI's internal AI attacker represents a more aggressive, self-testing approach to hardening its systems.
Expert Reality Check: Does the Value Justify the Risk?
Despite these efforts, security experts remain cautious. Rami McCarthy, principal security researcher at Wiz, told TechCrunch that the risk in AI systems can be understood as "autonomy multiplied by access." Agentic browsers like <keyword>ChatGPT Atlas</keyword> are in a particularly dangerous position, combining moderate autonomy with extremely high access to sensitive data like emails and payment information.
"For most everyday use cases, agentic browsers don’t yet deliver enough value to justify their current risk profile," McCarthy said, suggesting the trade-offs are still very real. OpenAI itself recommends users constrain their agents by giving them specific instructions and requiring user confirmation before sending messages or making payments—an implicit acknowledgment of the current limitations.
본 콘텐츠는 AI가 원문 기사를 기반으로 요약 및 분석한 것입니다. 정확성을 위해 노력하지만 오류가 있을 수 있으며, 원문 확인을 권장합니다.
관련 기사
OpenAI가 2025년 상반기 NCMEC에 제출한 아동 착취 신고 건수가 전년 동기 대비 80배 폭증했습니다. 사용자 증가와 기능 확장이 원인으로 꼽히는 가운데, AI 산업의 안전 책임 문제가 수면 위로 떠오르고 있습니다.
2025년 OpenAI는 '코드 레드' 상황 속에서 GPT-5.2를 출시하고 디즈니와 10억 달러 계약을 맺는 등 공세에 나섰지만, 동시에 심각한 저작권 및 안전성 소송에 직면했다. PRISM이 격동의 한 해를 심층 분석한다.
OpenAI의 최신 영상 생성 AI '소라 2'로 만든 가짜 아동용 장난감 광고가 틱톡에서 논란입니다. 성인용품을 연상시키는 이 영상은 AI가 어떻게 아동 착취물 제작에 악용될 수 있는지 보여주며, 기술의 윤리적 딜레마와 콘텐츠 관리의 한계를 드러냈습니다.
OpenAI가 챗GPT의 '따뜻함'과 '열정' 등 감성 톤을 직접 조절하는 기능을 출시했습니다. 이는 사용자 경험 혁신이자 AI 윤리 논란에 대한 응답입니다.