Liabooks Home|PRISM News
OpenAI Admits a Core AI Security Flaw Is 'Unlikely to Ever Be Fully Solved'
TechAI分析

OpenAI Admits a Core AI Security Flaw Is 'Unlikely to Ever Be Fully Solved'

Source

OpenAI concedes that prompt injection, a core AI security flaw, is 'unlikely to ever be fully solved.' We analyze their new defense—an AI-powered attacker—and the expert consensus on the risks of agentic AI.

OpenAI has conceded that one of the most critical vulnerabilities in modern AI systems, known as <keyword>prompt injection</keyword>, is a risk that’s not going away anytime soon. In a Monday blog post, the company stated that these attacks—which manipulate AI agents into following malicious instructions hidden in web pages or emails—are "unlikely to ever be fully ‘solved,’” comparing them to the persistent nature of scams and social engineering. This admission raises urgent questions about how safely autonomous AI agents can truly operate on the open web.

A Widely Recognized, Unsolvable Problem

OpenAI isn't alone in this grim assessment. The U.K.’s National Cyber Security Centre (NCSC) warned earlier this month that <keyword>prompt injection</keyword> attacks "may never be totally mitigated." The government agency advised cyber professionals to focus on reducing the risk and impact rather than thinking the attacks can be "stopped." For its part, OpenAI views the issue as a "long-term AI security challenge," acknowledging that its "agent mode" in <keyword>ChatGPT Atlas</keyword> significantly "expands the security threat surface."

Fighting AI with AI: The Automated Attacker

The company’s answer to this Sisyphean task is to build its own highly advanced hacker: an "LLM-based automated attacker." This bot, trained using reinforcement learning, is designed to proactively discover novel attack strategies before they are exploited in the wild. According to OpenAI, this automated attacker has already uncovered sophisticated, long-horizon attack methods that were missed by human red teaming campaigns.

Its key advantage is its insider access. The bot can see the target AI’s internal reasoning, allowing it to rapidly test an attack, study the response, tweak the exploit, and try again. While rivals like Google and Anthropic also emphasize layered defenses and continuous testing, OpenAI's internal AI attacker represents a more aggressive, self-testing approach to hardening its systems.

Expert Reality Check: Does the Value Justify the Risk?

Despite these efforts, security experts remain cautious. Rami McCarthy, principal security researcher at Wiz, told TechCrunch that the risk in AI systems can be understood as "autonomy multiplied by access." Agentic browsers like <keyword>ChatGPT Atlas</keyword> are in a particularly dangerous position, combining moderate autonomy with extremely high access to sensitive data like emails and payment information.

"For most everyday use cases, agentic browsers don’t yet deliver enough value to justify their current risk profile," McCarthy said, suggesting the trade-offs are still very real. OpenAI itself recommends users constrain their agents by giving them specific instructions and requiring user confirmation before sending messages or making payments—an implicit acknowledgment of the current limitations.

PRISM Insight: The New Arms Race: The power of agentic AI is also its poison. The industry is waking up to the fact that AI security isn't a bug to be patched but a chronic condition to be managed. This signals a shift from purely technical defenses to architectural ones, where 'user-in-the-loop' confirmation and constrained autonomy become core design principles, not just safety features. The future of powerful AI agents may be less about full autonomy and more about building a highly effective, but leashed, co-pilot.

本コンテンツはAIが原文記事を基に要約・分析したものです。正確性に努めていますが、誤りがある可能性があります。原文の確認をお勧めします。

OpenAIAI AgentsCybersecurityAI SecurityPrompt InjectionChatGPT Atlas

関連記事

ChatGPT、あなただけの2025年を振り返る「イヤー・イン・レビュー」機能を発表
TechJP
ChatGPT、あなただけの2025年を振り返る「イヤー・イン・レビュー」機能を発表

OpenAIのChatGPTが、2025年の利用状況をまとめた「イヤー・イン・レビュー」機能を公開。送信メッセージ数や、AIが生成するあなただけのピクセルアートで一年を振り返りましょう。

OpenAIが認める「終わらない戦い」:プロンプトインジェクション攻撃とAIエージェントの未来
TechJP
OpenAIが認める「終わらない戦い」:プロンプトインジェクション攻撃とAIエージェントの未来

OpenAIが、AIエージェントへの「プロンプトインジェクション」攻撃は完全には解決できない問題だと認めました。同社がこの終わらない戦いにどう立ち向かうのか、AIハッカーを用いた独自の防衛策と専門家の見解を解説します。

AIがAIを守る時代へ:OpenAI、強化学習でChatGPTの「プロンプト注入攻撃」対策を自動化
TechJP
AIがAIを守る時代へ:OpenAI、強化学習でChatGPTの「プロンプト注入攻撃」対策を自動化

OpenAIがChatGPT Atlasのプロンプト注入攻撃対策を強化。強化学習を用いた自動レッドチームで、AIエージェントのセキュリティをプロアクティブに防御する最新動向を解説します。

OpenAI、企業顧客100万社を突破 – AIはビジネスの「標準装備」へ
TechJP
OpenAI、企業顧客100万社を突破 – AIはビジネスの「標準装備」へ

OpenAIが、ペイパルやシスコなどを含む世界の企業顧客数が100万社を突破したと発表。生成AIが多様な業界で標準的なビジネスツールへと進化している現状を解説します。