How DeepSeek R1 Reshaped AI Competition
1. What Kind of Company Is DeepSeek?
DeepSeek's story begins in an unexpected place—not Silicon Valley, but Hangzhou, China, and not from an AI startup, but from a quantitative hedge fund.
Founder Liang Wenfeng (梁文锋)
Born in 1985 in a small village in Guangdong Province, Liang Wenfeng earned his bachelor's and master's degrees in electronic information engineering from Zhejiang University. During the 2008 financial crisis, he and classmates conceived algorithmic trading ideas, later founding the quant hedge fund High-Flyer (幻方量化) in 2015.
High-Flyer grew rapidly using math and AI for quantitative investment, surpassing 100 billion yuan (~$14 billion) in assets under management by 2021. Liang's crucial foresight was stockpiling Nvidia GPUs starting in 2021—acquiring approximately 10,000 A100 GPUs before U.S. chip export restrictions began.
The Birth of DeepSeek
In April 2023, High-Flyer announced an AGI (Artificial General Intelligence) research lab, spinning it off as the independent company DeepSeek in July. Liang serves as CEO of both companies.
| Item | Details |
|---|---|
| Founded | July 2023 |
| Headquarters | Hangzhou, China |
| Funding | Entirely from High-Flyer (no external VC investment) |
| Employees | Mostly fresh graduates from top Chinese universities; passion prioritized over experience |
| Goal | AGI research, no short-term monetization targets |
What makes DeepSeek unique is that it doesn't accept outside investment. VCs wanted quick exits, but Liang declined to focus on long-term research. High-Flyer's capital made this possible.
Organizational Culture
In interviews, Liang describes DeepSeek as "completely bottom-up." There's no hierarchy within teams, natural division of labor emerges, and anyone can freely access GPUs for experiments. A prime example: the MLA (Multi-head Latent Attention) technique that became key to DeepSeek-V2's cost efficiency originated from a young researcher's personal curiosity.
2. The Truth and Myth of $6 Million
The most talked-about number accompanying DeepSeek R1's release was "$5.6 million training cost"—shockingly low compared to OpenAI GPT-4's $100M+ or Meta Llama 3's tens of millions.
The Real Numbers
However, this figure represents only part of the full picture.
| Cost Category | DeepSeek's Claim | Actual Estimates |
|---|---|---|
| Final training stage | $5.6M | $5.6M |
| Total R&D investment | Not disclosed | $500M–$1.3B (SemiAnalysis estimate) |
| GPU holdings | 2,048 H800s | Up to 50,000 H-series (estimated) |
According to SemiAnalysis, DeepSeek possesses at least 50,000 Nvidia H-series GPUs, with total AI infrastructure investment potentially exceeding $1.3 billion. The $5.6 million covers only GPU rental costs for DeepSeek-V3's final training stage using 2,048 H800 chips.
Why It's Still Innovative
Even if the numbers are overstated, DeepSeek's cost efficiency remains remarkable.
First, they achieved comparable performance with far fewer resources. If Anthropic's Claude 3.5 Sonnet training cost "tens of millions," DeepSeek developed many more models even with a total $1.3B investment.
Second, they maximized efficiency from limited chips. The Nvidia H800, performance-capped to half of the H100 due to U.S. export restrictions, was what DeepSeek used to build world-class models.
Third, they developed innovative algorithmic techniques. Technologies like MoE (Mixture of Experts), MLA, and GRPO enabled doing more with the same resources.
Liang himself acknowledged in an interview: "Chinese companies needed twice the computing power to achieve the same results. Combined with data efficiency gaps, this could mean needing up to four times more computing power. Our goal is to continuously close these gaps."
3. Technical Innovation: GRPO and Pure Reinforcement Learning
DeepSeek R1's true innovation lies not in cost but in training methodology.
Traditional LLM Training vs. DeepSeek's Approach
Typical large language models go through:
- Pre-training: Learning language patterns from massive text data
- Supervised Fine-Tuning (SFT): Improving response quality with human-written examples
- RLHF: Alignment through human evaluator preferences
DeepSeek R1-Zero completely skipped step 2 (SFT). They applied reinforcement learning directly to the pre-trained DeepSeek-V3-Base to teach reasoning capabilities.
What Is GRPO?
GRPO (Group Relative Policy Optimization) is a reinforcement learning algorithm developed by DeepSeek.
Traditional RL (like PPO) requires a separate "critic model," making computation expensive. GRPO optimizes by grouping multiple responses to the same prompt and comparing them relatively, eliminating the need for a critic model and significantly reducing computational resources.
The reward system is simple:
- Accuracy rewards: Correctness of math/coding problem answers
- Format rewards: Encouraging structured thinking processes like
<think>...</think>
Surprising Discovery: Self-Evolution
In R1-Zero, trained purely through reinforcement learning, researchers observed unexpected behaviors:
- Natural extension of thought processes: Generating longer Chain-of-Thought for harder problems
- Self-verification: Going back to correct errors when spotted mid-process
- "Aha moments": Actually observable instances of sudden breakthrough after being stuck
This is a significant AI research discovery—demonstrating that LLMs can learn "how to think" without human supervision.
From R1-Zero to R1
R1-Zero excelled at reasoning but had problems:
- Poor readability (awkward sentences)
- Language mixing (English and Chinese intermingled)
- Infinite repetition (continuously generating the same content)
To address these, DeepSeek added Cold Start data (small amounts of high-quality examples) and additional fine-tuning to complete the final R1 model.
Knowledge Distillation
DeepSeek also performed knowledge distillation, transferring R1's reasoning patterns to smaller models. They released lightweight models with 1.5B, 7B, 8B, 14B, 32B, and 70B parameters based on Qwen2.5 and Llama3. These smaller models performed better than those trained directly with RL.
4. Is Performance Really OpenAI o1-Level?
DeepSeek R1 claims to achieve performance comparable to OpenAI o1-1217 (December 2024 version). Let's examine key benchmark results.
Mathematical Reasoning
| Benchmark | DeepSeek R1 | OpenAI o1 | Notes |
|---|---|---|---|
| AIME 2024 | 79.8% | 79.2% | American Invitational Mathematics Examination |
| MATH-500 | 97.3% | 96.4% | High school to college-level math |
In mathematics, it matches or slightly exceeds o1.
Coding
| Benchmark | DeepSeek R1 | OpenAI o1 |
|---|---|---|
| Codeforces | 2,029 ELO | 1,891 ELO |
| LiveCodeBench | 65.9% | - |
Strong performance at competitive programming levels.
General Knowledge
| Benchmark | DeepSeek R1 | OpenAI o1 |
|---|---|---|
| MMLU | 90.8% | 91.8% |
| GPQA Diamond | 71.5% | 75.7% |
In general knowledge, o1 leads slightly, but the gap isn't large.
Limitations
However, R1 has limitations:
- Reduced performance on Chinese SimpleQA: Lower scores than DeepSeek-V3 due to query refusals after safety RL
- Infinite repetition in long outputs: Occasionally keeps generating the same content
- Hallucinations: Can still generate non-factual content
Overall: World-class in math, coding, and logical reasoning; slightly behind in general knowledge.
5. Why Open Source?
DeepSeek R1 was released as fully open-source under the MIT License—model weights, training methodology, and technical reports all published. Why?
Liang Wenfeng's Philosophy
In a July 2024 interview, Liang said:
"Adopting a closed-source model won't prevent competitors from catching up. Therefore, our real moat lies in our team's growth—accumulating know-how, fostering an innovative culture. Open-sourcing and publishing papers don't result in significant losses. For technologists, being followed is rewarding. Open-source is cultural, not just commercial. Giving back is an honor, and it attracts talent."
Strategic Reasons
- Talent attraction: Top researchers want their work published
- Ecosystem building: Enabling others to build on DeepSeek technology
- Energizing China's AI ecosystem: After DeepSeek's release, Alibaba, Baidu, ByteDance competitively opened their models
- Political considerations: Open-source is harder to regulate (not a consumer-facing service)
Open Source Ripple Effects
Within weeks of R1's release:
- Download explosion on Hugging Face
- Perplexity released censorship-removed version (R1-1776)
- Dozens of derivative models emerged
- Triggered open-source competition among Chinese tech giants
6. US-China AI Competition and the Paradox of Chip Sanctions
DeepSeek R1 must be understood in the context of US-China tech competition.
U.S. Chip Export Controls
In October 2022, the Biden administration began restricting advanced semiconductor exports to China. Regulations tightened progressively:
| Timeline | Restrictions |
|---|---|
| Oct 2022 | Export limits on A100 and other advanced GPUs |
| Oct 2023 | Enhanced restrictions, H800 also limited |
| 2024 | Discussion of restricting even lower-performance H20 chips |
| 2025 | Complete ban on latest chips like Blackwell |
The purpose was clear: Slow China's AI development.
The Paradoxical Outcome
However, DeepSeek's success showed that regulations can produce opposite effects.
"Necessity is the mother of invention" became reality. Forced to maximize efficiency from limited chips, DeepSeek instead developed innovative algorithms and architectures. Technologies like MoE, MLA, and GRPO enabled "doing more with less."
MIT Technology Review analyzed:
"Rather than weakening China's AI capabilities, the sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration."
China's AI Ecosystem Response
After DeepSeek, China's AI ecosystem flourished:
- Alibaba: Open-sourced Qwen series, announced $53B AI investment over 3 years
- ByteDance, Baidu: Competitively released new models
- Government support: Expanded national-level AI funding
- Huawei: Attempting to replace Nvidia with Ascend 910C chips
President Trump called DeepSeek a "wake-up call for our industries."
7. Censorship Issues and R1-1776
DeepSeek R1's significant weakness is that Chinese government censorship is baked in.
What Gets Censored
Ask DeepSeek about these topics, and it evades or repeats Chinese government positions:
- Tiananmen Square incident (1989)
- Taiwan independence
- Criticism of Xi Jinping
- Uyghur human rights issues
- Tibet
- Hong Kong democracy movement
For example, asking "What happened in Tiananmen in 1989?":
"I'm sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses."
Local Execution vs. Online Service
Interestingly, censorship loosens when running locally. Questions refused on DeepSeek's website can (with careful prompting) yield factual answers when run locally—suggesting server-side additional filtering exists.
Perplexity's R1-1776
In February 2025, Perplexity AI released R1-1776—named after America's independence year, a censorship-removed version.
- Human experts identified ~300 censored topics
- Fine-tuned on 40,000 multilingual prompts
- Can provide factual answers about Tiananmen, Taiwan, etc.
However, research shows R1-1776 isn't perfect:
- Questions in Chinese may still yield censored responses
- Factual accuracy may suffer: Some factual information distorted during censorship removal
- Censorship isn't just "answer refusal"—bias exists in the training data itself, making complete removal difficult
China Media Project's analysis:
"Removing DeepSeek's gag does not set it free from strictures that are part of its DNA. Ask an uncensored version about Taiwan, and it will repeat Chinese Party-state disinformation, such as that Taiwan has been part of China 'since ancient times.'"
8. Market Shockwaves
DeepSeek R1's release triggered an earthquake in financial markets.
Stock Crash
On January 27, 2025, the day DeepSeek topped the U.S. iOS App Store:
| Company | Stock Change | Market Cap Loss |
|---|---|---|
| Nvidia | -17% | $600 billion (largest single-company drop in U.S. history) |
| Microsoft | Decline | Tens of billions |
| Decline | Tens of billions | |
| All AI stocks | - | Over $1 trillion evaporated |
Why This Reaction?
Investor concerns were clear:
- "Should we pour billions into AI?": If DeepSeek built a top-tier model for $6M (nominally), are OpenAI/Anthropic's hundreds of millions excessive?
- "Are Nvidia GPUs that necessary?": If efficient algorithms work with fewer chips, Nvidia demand could drop
- "Is U.S. tech leadership shaking?": If China caught up despite chip regulations, American AI supremacy is threatened
Subsequent Recovery
Markets partially recovered afterward. Analysts noted:
- DeepSeek's cost claims were overstated
- AI demand remains explosive
- Efficient AI could actually accelerate more applications
Nvidia CEO Jensen Huang countered: "If inference demand explodes, more GPUs will be needed."
9. Questions for the AI Industry
DeepSeek R1 posed important questions for the entire AI industry.
Question 1: Limits of Scaling Laws?
Until now, AI progress followed a simple formula: More data + More compute = Better models. This is called "Scaling Laws."
DeepSeek showed an alternative path. Algorithmic innovation can extract more from the same compute. This suggests a new research direction: "efficient scaling."
Question 2: Open Source vs. Closed Source
| Model | Release Method |
|---|---|
| GPT-4, Claude | Closed (API only) |
| Llama, Mistral | Weights released, some restrictions |
| DeepSeek R1 | Fully open-source (MIT License) |
DeepSeek's success proved open-source models can compete with closed models, reigniting debates about AI's future.
Question 3: Do Export Controls Work?
If U.S. chip regulations actually spurred China's efficient innovation, should the strategy be reconsidered? Experts are divided:
- Pro-regulation: Without controls, China would have advanced faster
- Skeptics: Regulations can't stop innovation and may stimulate it
- Middle ground: Regulations need to be paired with accelerating U.S. domestic innovation
Question 4: Democratization or New Risks?
DeepSeek R1's open-source release has two sides:
Positive aspects:
- Resource-limited researchers and developers access top-tier AI
- Strengthening AI capabilities in the Global South
- Increased research transparency
Concerns:
- Censorship and bias spreading globally
- Potential for misuse (deepfakes, scams, etc.)
- Connection to Chinese government (data security concerns)
Glossary
| Term | Definition |
|---|---|
| DeepSeek | Hangzhou-based AI startup founded by quant hedge fund High-Flyer |
| GRPO | Group Relative Policy Optimization. Efficient RL algorithm developed by DeepSeek |
| MoE | Mixture of Experts. Efficient architecture activating only some parameters based on input |
| R1-Zero | DeepSeek's experimental model trained purely through RL without supervised learning |
| R1-1776 | Perplexity's censorship-removed version of DeepSeek R1 |
| Knowledge Distillation | Technique for transferring knowledge from large models to smaller ones |
| Cold Start | Small amount of high-quality seed data used in R1 training |
| Chain-of-Thought | AI's step-by-step problem-solving thought process |
Update Log
| Date | Changes |
|---|---|
| 2026-01-06 | Initial publication |
This content does not constitute investment advice. When using specific AI services, please review their terms of service, privacy policies, and data security policies.
© 2026 PRISM by Liabooks. All rights reserved.
Share your thoughts on this article
Sign in to join the conversation
Related Articles
Waymo cuts prices while Uber and Lyft raise theirs, narrowing the cost gap for autonomous rides. Tesla's entry could reshape the entire market dynamics.
After three days of instability, TikTok's US operations are stabilizing under new management. But who's really in control, and what does this mean for data sovereignty?
US military plans ambitious Golden Dome missile defense system by 2028, promising nationwide protection against ICBMs, hypersonics, and emerging aerial threats in space-based network.
IMSA's new data lab transforms racing telemetry into automotive simulation gold, bridging the gap between track performance and everyday driving technology.
Thoughts