OpenAI Wants Your Real Work Files: The Hidden Strategy for Next-Gen AI Model Evaluation
OpenAI is asking contractors to upload real work files to set a human baseline for its next-gen AI models. This move highlights a new data strategy and significant legal risks.
Your past work reports might be the secret sauce for the next AGI. OpenAI is reportedly asking third-party contractors to upload real assignments and deliverables from their current or previous workplaces to evaluate the performance of its next-generation AI models.
Establishing the Human Baseline for OpenAI AI Model Evaluation
According to internal records obtained by WIRED, this project is part of a broader effort to create a "human baseline" for complex tasks. OpenAI believes that measuring its models against high-level human professionals is a key indicator of progress toward achieving AGI—an AI that outperforms humans at most economically valuable tasks.
Contractors are instructed to turn their "long-term or complex work" into tasks for the AI. This includes uploading actual files—Word docs, PDFs, PowerPoints, and even code repositories. One example provided in a presentation featured a 2-page PDF yacht trip itinerary created for ultra-high-net-worth individuals, used as a benchmark for "experienced human deliverable."
The Legal Minefield of Trade Secrets
The practice has raised red flags among legal experts. While OpenAI advises contractors to use a tool called "Superstar Scrubbing" to delete proprietary information, intellectual property lawyers warn that this is a risky gamble. Contractors sharing documents from previous employers could be violating non-disclosure agreements (NDAs) or exposing trade secrets.
This isn't the first time OpenAI has sought sensitive data. Sources indicate the company once inquired about purchasing internal communications and emails from failed firms, though the deal didn't proceed due to concerns over data scrubbing reliability.
This content is AI-generated based on source articles. While we strive for accuracy, errors may occur. We recommend verifying with the original source.
Related Articles
Anna's Archive faces a major domain suspension after a 300TB Spotify scrape. Explore the impact on digital rights, AI training, and the future of shadow libraries.
A massive Condé Nast data breach in 2025 has exposed 2.3 million WIRED user records, with 40 million more at risk. Learn why Ars Technica stayed safe.
Coupang reports a data breach by a former employee who accessed 33 million accounts but only saved data from 3,000, which has been recovered. No data was shared externally.
AI agents from Google, OpenAI, and others promise convenience but demand unprecedented access to your emails, files, and more. We analyze the profound threat this poses to data privacy and cybersecurity.