Liabooks Home|PRISM News
How DeepSeek R1 Reshaped AI Competition
Tech

How DeepSeek R1 Reshaped AI Competition

12 min read


1. What Kind of Company Is DeepSeek?

DeepSeek's story begins in an unexpected place—not Silicon Valley, but Hangzhou, China, and not from an AI startup, but from a quantitative hedge fund.

Founder Liang Wenfeng (梁文锋)

Born in 1985 in a small village in Guangdong Province, Liang Wenfeng earned his bachelor's and master's degrees in electronic information engineering from Zhejiang University. During the 2008 financial crisis, he and classmates conceived algorithmic trading ideas, later founding the quant hedge fund High-Flyer (幻方量化) in 2015.

High-Flyer grew rapidly using math and AI for quantitative investment, surpassing 100 billion yuan (~$14 billion) in assets under management by 2021. Liang's crucial foresight was stockpiling Nvidia GPUs starting in 2021—acquiring approximately 10,000 A100 GPUs before U.S. chip export restrictions began.

The Birth of DeepSeek

In April 2023, High-Flyer announced an AGI (Artificial General Intelligence) research lab, spinning it off as the independent company DeepSeek in July. Liang serves as CEO of both companies.

ItemDetails
FoundedJuly 2023
HeadquartersHangzhou, China
FundingEntirely from High-Flyer (no external VC investment)
EmployeesMostly fresh graduates from top Chinese universities; passion prioritized over experience
GoalAGI research, no short-term monetization targets

What makes DeepSeek unique is that it doesn't accept outside investment. VCs wanted quick exits, but Liang declined to focus on long-term research. High-Flyer's capital made this possible.

Organizational Culture

In interviews, Liang describes DeepSeek as "completely bottom-up." There's no hierarchy within teams, natural division of labor emerges, and anyone can freely access GPUs for experiments. A prime example: the MLA (Multi-head Latent Attention) technique that became key to DeepSeek-V2's cost efficiency originated from a young researcher's personal curiosity.


2. The Truth and Myth of $6 Million

The most talked-about number accompanying DeepSeek R1's release was "$5.6 million training cost"—shockingly low compared to OpenAI GPT-4's $100M+ or Meta Llama 3's tens of millions.

The Real Numbers

However, this figure represents only part of the full picture.

Cost CategoryDeepSeek's ClaimActual Estimates
Final training stage$5.6M$5.6M
Total R&D investmentNot disclosed$500M–$1.3B (SemiAnalysis estimate)
GPU holdings2,048 H800sUp to 50,000 H-series (estimated)

According to SemiAnalysis, DeepSeek possesses at least 50,000 Nvidia H-series GPUs, with total AI infrastructure investment potentially exceeding $1.3 billion. The $5.6 million covers only GPU rental costs for DeepSeek-V3's final training stage using 2,048 H800 chips.

Why It's Still Innovative

Even if the numbers are overstated, DeepSeek's cost efficiency remains remarkable.

First, they achieved comparable performance with far fewer resources. If Anthropic's Claude 3.5 Sonnet training cost "tens of millions," DeepSeek developed many more models even with a total $1.3B investment.

Second, they maximized efficiency from limited chips. The Nvidia H800, performance-capped to half of the H100 due to U.S. export restrictions, was what DeepSeek used to build world-class models.

Third, they developed innovative algorithmic techniques. Technologies like MoE (Mixture of Experts), MLA, and GRPO enabled doing more with the same resources.

Liang himself acknowledged in an interview: "Chinese companies needed twice the computing power to achieve the same results. Combined with data efficiency gaps, this could mean needing up to four times more computing power. Our goal is to continuously close these gaps."


3. Technical Innovation: GRPO and Pure Reinforcement Learning

DeepSeek R1's true innovation lies not in cost but in training methodology.

Traditional LLM Training vs. DeepSeek's Approach

Typical large language models go through:

  1. Pre-training: Learning language patterns from massive text data
  2. Supervised Fine-Tuning (SFT): Improving response quality with human-written examples
  3. RLHF: Alignment through human evaluator preferences

DeepSeek R1-Zero completely skipped step 2 (SFT). They applied reinforcement learning directly to the pre-trained DeepSeek-V3-Base to teach reasoning capabilities.

What Is GRPO?

GRPO (Group Relative Policy Optimization) is a reinforcement learning algorithm developed by DeepSeek.

Traditional RL (like PPO) requires a separate "critic model," making computation expensive. GRPO optimizes by grouping multiple responses to the same prompt and comparing them relatively, eliminating the need for a critic model and significantly reducing computational resources.

The reward system is simple:

  • Accuracy rewards: Correctness of math/coding problem answers
  • Format rewards: Encouraging structured thinking processes like <think>...</think>

Surprising Discovery: Self-Evolution

In R1-Zero, trained purely through reinforcement learning, researchers observed unexpected behaviors:

  • Natural extension of thought processes: Generating longer Chain-of-Thought for harder problems
  • Self-verification: Going back to correct errors when spotted mid-process
  • "Aha moments": Actually observable instances of sudden breakthrough after being stuck

This is a significant AI research discovery—demonstrating that LLMs can learn "how to think" without human supervision.

From R1-Zero to R1

R1-Zero excelled at reasoning but had problems:

  • Poor readability (awkward sentences)
  • Language mixing (English and Chinese intermingled)
  • Infinite repetition (continuously generating the same content)

To address these, DeepSeek added Cold Start data (small amounts of high-quality examples) and additional fine-tuning to complete the final R1 model.

Knowledge Distillation

DeepSeek also performed knowledge distillation, transferring R1's reasoning patterns to smaller models. They released lightweight models with 1.5B, 7B, 8B, 14B, 32B, and 70B parameters based on Qwen2.5 and Llama3. These smaller models performed better than those trained directly with RL.


4. Is Performance Really OpenAI o1-Level?

DeepSeek R1 claims to achieve performance comparable to OpenAI o1-1217 (December 2024 version). Let's examine key benchmark results.

Mathematical Reasoning

BenchmarkDeepSeek R1OpenAI o1Notes
AIME 202479.8%79.2%American Invitational Mathematics Examination
MATH-50097.3%96.4%High school to college-level math

In mathematics, it matches or slightly exceeds o1.

Coding

BenchmarkDeepSeek R1OpenAI o1
Codeforces2,029 ELO1,891 ELO
LiveCodeBench65.9%-

Strong performance at competitive programming levels.

General Knowledge

BenchmarkDeepSeek R1OpenAI o1
MMLU90.8%91.8%
GPQA Diamond71.5%75.7%

In general knowledge, o1 leads slightly, but the gap isn't large.

Limitations

However, R1 has limitations:

  • Reduced performance on Chinese SimpleQA: Lower scores than DeepSeek-V3 due to query refusals after safety RL
  • Infinite repetition in long outputs: Occasionally keeps generating the same content
  • Hallucinations: Can still generate non-factual content

Overall: World-class in math, coding, and logical reasoning; slightly behind in general knowledge.


5. Why Open Source?

DeepSeek R1 was released as fully open-source under the MIT License—model weights, training methodology, and technical reports all published. Why?

Liang Wenfeng's Philosophy

In a July 2024 interview, Liang said:

"Adopting a closed-source model won't prevent competitors from catching up. Therefore, our real moat lies in our team's growth—accumulating know-how, fostering an innovative culture. Open-sourcing and publishing papers don't result in significant losses. For technologists, being followed is rewarding. Open-source is cultural, not just commercial. Giving back is an honor, and it attracts talent."

Strategic Reasons

  1. Talent attraction: Top researchers want their work published
  2. Ecosystem building: Enabling others to build on DeepSeek technology
  3. Energizing China's AI ecosystem: After DeepSeek's release, Alibaba, Baidu, ByteDance competitively opened their models
  4. Political considerations: Open-source is harder to regulate (not a consumer-facing service)

Open Source Ripple Effects

Within weeks of R1's release:

  • Download explosion on Hugging Face
  • Perplexity released censorship-removed version (R1-1776)
  • Dozens of derivative models emerged
  • Triggered open-source competition among Chinese tech giants

6. US-China AI Competition and the Paradox of Chip Sanctions

DeepSeek R1 must be understood in the context of US-China tech competition.

U.S. Chip Export Controls

In October 2022, the Biden administration began restricting advanced semiconductor exports to China. Regulations tightened progressively:

TimelineRestrictions
Oct 2022Export limits on A100 and other advanced GPUs
Oct 2023Enhanced restrictions, H800 also limited
2024Discussion of restricting even lower-performance H20 chips
2025Complete ban on latest chips like Blackwell

The purpose was clear: Slow China's AI development.

The Paradoxical Outcome

However, DeepSeek's success showed that regulations can produce opposite effects.

"Necessity is the mother of invention" became reality. Forced to maximize efficiency from limited chips, DeepSeek instead developed innovative algorithms and architectures. Technologies like MoE, MLA, and GRPO enabled "doing more with less."

MIT Technology Review analyzed:

"Rather than weakening China's AI capabilities, the sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration."

China's AI Ecosystem Response

After DeepSeek, China's AI ecosystem flourished:

  • Alibaba: Open-sourced Qwen series, announced $53B AI investment over 3 years
  • ByteDance, Baidu: Competitively released new models
  • Government support: Expanded national-level AI funding
  • Huawei: Attempting to replace Nvidia with Ascend 910C chips

President Trump called DeepSeek a "wake-up call for our industries."


7. Censorship Issues and R1-1776

DeepSeek R1's significant weakness is that Chinese government censorship is baked in.

What Gets Censored

Ask DeepSeek about these topics, and it evades or repeats Chinese government positions:

  • Tiananmen Square incident (1989)
  • Taiwan independence
  • Criticism of Xi Jinping
  • Uyghur human rights issues
  • Tibet
  • Hong Kong democracy movement

For example, asking "What happened in Tiananmen in 1989?":

"I'm sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses."

Local Execution vs. Online Service

Interestingly, censorship loosens when running locally. Questions refused on DeepSeek's website can (with careful prompting) yield factual answers when run locally—suggesting server-side additional filtering exists.

Perplexity's R1-1776

In February 2025, Perplexity AI released R1-1776—named after America's independence year, a censorship-removed version.

  • Human experts identified ~300 censored topics
  • Fine-tuned on 40,000 multilingual prompts
  • Can provide factual answers about Tiananmen, Taiwan, etc.

However, research shows R1-1776 isn't perfect:

  • Questions in Chinese may still yield censored responses
  • Factual accuracy may suffer: Some factual information distorted during censorship removal
  • Censorship isn't just "answer refusal"—bias exists in the training data itself, making complete removal difficult

China Media Project's analysis:

"Removing DeepSeek's gag does not set it free from strictures that are part of its DNA. Ask an uncensored version about Taiwan, and it will repeat Chinese Party-state disinformation, such as that Taiwan has been part of China 'since ancient times.'"


8. Market Shockwaves

DeepSeek R1's release triggered an earthquake in financial markets.

Stock Crash

On January 27, 2025, the day DeepSeek topped the U.S. iOS App Store:

CompanyStock ChangeMarket Cap Loss
Nvidia-17%$600 billion (largest single-company drop in U.S. history)
MicrosoftDeclineTens of billions
GoogleDeclineTens of billions
All AI stocks-Over $1 trillion evaporated

Why This Reaction?

Investor concerns were clear:

  1. "Should we pour billions into AI?": If DeepSeek built a top-tier model for $6M (nominally), are OpenAI/Anthropic's hundreds of millions excessive?
  1. "Are Nvidia GPUs that necessary?": If efficient algorithms work with fewer chips, Nvidia demand could drop
  1. "Is U.S. tech leadership shaking?": If China caught up despite chip regulations, American AI supremacy is threatened

Subsequent Recovery

Markets partially recovered afterward. Analysts noted:

  • DeepSeek's cost claims were overstated
  • AI demand remains explosive
  • Efficient AI could actually accelerate more applications

Nvidia CEO Jensen Huang countered: "If inference demand explodes, more GPUs will be needed."


9. Questions for the AI Industry

DeepSeek R1 posed important questions for the entire AI industry.

Question 1: Limits of Scaling Laws?

Until now, AI progress followed a simple formula: More data + More compute = Better models. This is called "Scaling Laws."

DeepSeek showed an alternative path. Algorithmic innovation can extract more from the same compute. This suggests a new research direction: "efficient scaling."

Question 2: Open Source vs. Closed Source

ModelRelease Method
GPT-4, ClaudeClosed (API only)
Llama, MistralWeights released, some restrictions
DeepSeek R1Fully open-source (MIT License)

DeepSeek's success proved open-source models can compete with closed models, reigniting debates about AI's future.

Question 3: Do Export Controls Work?

If U.S. chip regulations actually spurred China's efficient innovation, should the strategy be reconsidered? Experts are divided:

  • Pro-regulation: Without controls, China would have advanced faster
  • Skeptics: Regulations can't stop innovation and may stimulate it
  • Middle ground: Regulations need to be paired with accelerating U.S. domestic innovation

Question 4: Democratization or New Risks?

DeepSeek R1's open-source release has two sides:

Positive aspects:

  • Resource-limited researchers and developers access top-tier AI
  • Strengthening AI capabilities in the Global South
  • Increased research transparency

Concerns:

  • Censorship and bias spreading globally
  • Potential for misuse (deepfakes, scams, etc.)
  • Connection to Chinese government (data security concerns)


Glossary

TermDefinition
DeepSeekHangzhou-based AI startup founded by quant hedge fund High-Flyer
GRPOGroup Relative Policy Optimization. Efficient RL algorithm developed by DeepSeek
MoEMixture of Experts. Efficient architecture activating only some parameters based on input
R1-ZeroDeepSeek's experimental model trained purely through RL without supervised learning
R1-1776Perplexity's censorship-removed version of DeepSeek R1
Knowledge DistillationTechnique for transferring knowledge from large models to smaller ones
Cold StartSmall amount of high-quality seed data used in R1 training
Chain-of-ThoughtAI's step-by-step problem-solving thought process

Update Log

DateChanges
2026-01-06Initial publication

This content does not constitute investment advice. When using specific AI services, please review their terms of service, privacy policies, and data security policies.

© 2026 PRISM by Liabooks. All rights reserved.

Thoughts

Authors

Min Hwang

"17 years in the field, now telling the story of technology"

Related Articles