How DeepSeek R1 Reshaped AI Competition | Tech

💡 TL;DR

DeepSeek R1 is an open-source reasoning model released by a Chinese startup in January 2025, claiming to match OpenAI o1's performance at a fraction of the cost—sending shockwaves through the global AI industry.
The "$6 million training cost" is only half the story. That figure covers only the final training stage; total investment is estimated at $500M–$1.3B. Still significantly lower than U.S. competitors.
The core technical innovation is GRPO (Group Relative Policy Optimization)—the first publicly disclosed research showing LLMs can learn reasoning through pure reinforcement learning without supervised fine-tuning.
It opened a new chapter in US-China AI competition. The paradox: U.S. chip export controls may have actually spurred China's efficiency-focused innovation.
Censorship issues exist. The model evades questions on topics like Tiananmen and Taiwan, prompting Perplexity to release R1-1776, a censorship-removed version.

1. What Kind of Company Is DeepSeek?

DeepSeek's story begins in an unexpected place—not Silicon Valley, but Hangzhou, China, and not from an AI startup, but from a quantitative hedge fund.

Founder Liang Wenfeng (梁文锋)

Born in 1985 in a small village in Guangdong Province, Liang Wenfeng earned his bachelor's and master's degrees in electronic information engineering from Zhejiang University. During the 2008 financial crisis, he and classmates conceived algorithmic trading ideas, later founding the quant hedge fund High-Flyer (幻方量化) in 2015.

High-Flyer grew rapidly using math and AI for quantitative investment, surpassing 100 billion yuan (~$14 billion) in assets under management by 2021. Liang's crucial foresight was stockpiling Nvidia GPUs starting in 2021—acquiring approximately 10,000 A100 GPUs before U.S. chip export restrictions began.

The Birth of DeepSeek

In April 2023, High-Flyer announced an AGI (Artificial General Intelligence) research lab, spinning it off as the independent company DeepSeek in July. Liang serves as CEO of both companies.

Item	Details
Founded	July 2023
Headquarters	Hangzhou, China
Funding	Entirely from High-Flyer (no external VC investment)
Employees	Mostly fresh graduates from top Chinese universities; passion prioritized over experience
Goal	AGI research, no short-term monetization targets

What makes DeepSeek unique is that it doesn't accept outside investment. VCs wanted quick exits, but Liang declined to focus on long-term research. High-Flyer's capital made this possible.

Organizational Culture

In interviews, Liang describes DeepSeek as "completely bottom-up." There's no hierarchy within teams, natural division of labor emerges, and anyone can freely access GPUs for experiments. A prime example: the MLA (Multi-head Latent Attention) technique that became key to DeepSeek-V2's cost efficiency originated from a young researcher's personal curiosity.

2. The Truth and Myth of $6 Million

The most talked-about number accompanying DeepSeek R1's release was "$5.6 million training cost"—shockingly low compared to OpenAI GPT-4's $100M+ or Meta Llama 3's tens of millions.

The Real Numbers

However, this figure represents only part of the full picture.

Cost Category	DeepSeek's Claim	Actual Estimates
Final training stage	$5.6M	$5.6M
Total R&D investment	Not disclosed	$500M–$1.3B (SemiAnalysis estimate)
GPU holdings	2,048 H800s	Up to 50,000 H-series (estimated)

According to SemiAnalysis, DeepSeek possesses at least 50,000 Nvidia H-series GPUs, with total AI infrastructure investment potentially exceeding $1.3 billion. The $5.6 million covers only GPU rental costs for DeepSeek-V3's final training stage using 2,048 H800 chips.

Why It's Still Innovative

Even if the numbers are overstated, DeepSeek's cost efficiency remains remarkable.

First, they achieved comparable performance with far fewer resources. If Anthropic's Claude 3.5 Sonnet training cost "tens of millions," DeepSeek developed many more models even with a total $1.3B investment.

Second, they maximized efficiency from limited chips. The Nvidia H800, performance-capped to half of the H100 due to U.S. export restrictions, was what DeepSeek used to build world-class models.

Third, they developed innovative algorithmic techniques. Technologies like MoE (Mixture of Experts), MLA, and GRPO enabled doing more with the same resources.

Liang himself acknowledged in an interview: "Chinese companies needed twice the computing power to achieve the same results. Combined with data efficiency gaps, this could mean needing up to four times more computing power. Our goal is to continuously close these gaps."

3. Technical Innovation: GRPO and Pure Reinforcement Learning

DeepSeek R1's true innovation lies not in cost but in training methodology.

Traditional LLM Training vs. DeepSeek's Approach

Typical large language models go through:

Pre-training: Learning language patterns from massive text data
Supervised Fine-Tuning (SFT): Improving response quality with human-written examples
RLHF: Alignment through human evaluator preferences

DeepSeek R1-Zero completely skipped step 2 (SFT). They applied reinforcement learning directly to the pre-trained DeepSeek-V3-Base to teach reasoning capabilities.

What Is GRPO?

GRPO (Group Relative Policy Optimization) is a reinforcement learning algorithm developed by DeepSeek.

Traditional RL (like PPO) requires a separate "critic model," making computation expensive. GRPO optimizes by grouping multiple responses to the same prompt and comparing them relatively, eliminating the need for a critic model and significantly reducing computational resources.

The reward system is simple:

Accuracy rewards: Correctness of math/coding problem answers
Format rewards: Encouraging structured thinking processes like <think>...</think>

Surprising Discovery: Self-Evolution

In R1-Zero, trained purely through reinforcement learning, researchers observed unexpected behaviors:

Natural extension of thought processes: Generating longer Chain-of-Thought for harder problems
Self-verification: Going back to correct errors when spotted mid-process
"Aha moments": Actually observable instances of sudden breakthrough after being stuck

This is a significant AI research discovery—demonstrating that LLMs can learn "how to think" without human supervision.

From R1-Zero to R1

R1-Zero excelled at reasoning but had problems:

Poor readability (awkward sentences)
Language mixing (English and Chinese intermingled)
Infinite repetition (continuously generating the same content)

To address these, DeepSeek added Cold Start data (small amounts of high-quality examples) and additional fine-tuning to complete the final R1 model.

Knowledge Distillation

DeepSeek also performed knowledge distillation, transferring R1's reasoning patterns to smaller models. They released lightweight models with 1.5B, 7B, 8B, 14B, 32B, and 70B parameters based on Qwen2.5 and Llama3. These smaller models performed better than those trained directly with RL.

4. Is Performance Really OpenAI o1-Level?

DeepSeek R1 claims to achieve performance comparable to OpenAI o1-1217 (December 2024 version). Let's examine key benchmark results.

Mathematical Reasoning

Benchmark	DeepSeek R1	OpenAI o1	Notes
AIME 2024	79.8%	79.2%	American Invitational Mathematics Examination
MATH-500	97.3%	96.4%	High school to college-level math

In mathematics, it matches or slightly exceeds o1.

Coding

Benchmark	DeepSeek R1	OpenAI o1
Codeforces	2,029 ELO	1,891 ELO
LiveCodeBench	65.9%	-

Strong performance at competitive programming levels.

General Knowledge

Benchmark	DeepSeek R1	OpenAI o1
MMLU	90.8%	91.8%
GPQA Diamond	71.5%	75.7%

In general knowledge, o1 leads slightly, but the gap isn't large.

Limitations

Advertise with Us

[email protected]

However, R1 has limitations:

Reduced performance on Chinese SimpleQA: Lower scores than DeepSeek-V3 due to query refusals after safety RL
Infinite repetition in long outputs: Occasionally keeps generating the same content
Hallucinations: Can still generate non-factual content

Overall: World-class in math, coding, and logical reasoning; slightly behind in general knowledge.

5. Why Open Source?

DeepSeek R1 was released as fully open-source under the MIT License—model weights, training methodology, and technical reports all published. Why?

Liang Wenfeng's Philosophy

In a July 2024 interview, Liang said:

"Adopting a closed-source model won't prevent competitors from catching up. Therefore, our real moat lies in our team's growth—accumulating know-how, fostering an innovative culture. Open-sourcing and publishing papers don't result in significant losses. For technologists, being followed is rewarding. Open-source is cultural, not just commercial. Giving back is an honor, and it attracts talent."

Strategic Reasons

Talent attraction: Top researchers want their work published
Ecosystem building: Enabling others to build on DeepSeek technology
Energizing China's AI ecosystem: After DeepSeek's release, Alibaba, Baidu, ByteDance competitively opened their models
Political considerations: Open-source is harder to regulate (not a consumer-facing service)

Open Source Ripple Effects

Within weeks of R1's release:

Download explosion on Hugging Face
Perplexity released censorship-removed version (R1-1776)
Dozens of derivative models emerged
Triggered open-source competition among Chinese tech giants

6. US-China AI Competition and the Paradox of Chip Sanctions

DeepSeek R1 must be understood in the context of US-China tech competition.

U.S. Chip Export Controls

In October 2022, the Biden administration began restricting advanced semiconductor exports to China. Regulations tightened progressively:

Timeline	Restrictions
Oct 2022	Export limits on A100 and other advanced GPUs
Oct 2023	Enhanced restrictions, H800 also limited
2024	Discussion of restricting even lower-performance H20 chips
2025	Complete ban on latest chips like Blackwell

The purpose was clear: Slow China's AI development.

The Paradoxical Outcome

However, DeepSeek's success showed that regulations can produce opposite effects.

"Necessity is the mother of invention" became reality. Forced to maximize efficiency from limited chips, DeepSeek instead developed innovative algorithms and architectures. Technologies like MoE, MLA, and GRPO enabled "doing more with less."

MIT Technology Review analyzed:

"Rather than weakening China's AI capabilities, the sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration."

China's AI Ecosystem Response

After DeepSeek, China's AI ecosystem flourished:

Alibaba: Open-sourced Qwen series, announced $53B AI investment over 3 years
ByteDance, Baidu: Competitively released new models
Government support: Expanded national-level AI funding
Huawei: Attempting to replace Nvidia with Ascend 910C chips

President Trump called DeepSeek a "wake-up call for our industries."

7. Censorship Issues and R1-1776

DeepSeek R1's significant weakness is that Chinese government censorship is baked in.

What Gets Censored

Ask DeepSeek about these topics, and it evades or repeats Chinese government positions:

Tiananmen Square incident (1989)
Taiwan independence
Criticism of Xi Jinping
Uyghur human rights issues
Tibet
Hong Kong democracy movement

For example, asking "What happened in Tiananmen in 1989?":

"I'm sorry, I cannot answer that question. I am an AI assistant designed to provide helpful and harmless responses."

Local Execution vs. Online Service

Interestingly, censorship loosens when running locally. Questions refused on DeepSeek's website can (with careful prompting) yield factual answers when run locally—suggesting server-side additional filtering exists.

Perplexity's R1-1776

In February 2025, Perplexity AI released R1-1776—named after America's independence year, a censorship-removed version.

Human experts identified ~300 censored topics
Fine-tuned on 40,000 multilingual prompts
Can provide factual answers about Tiananmen, Taiwan, etc.

However, research shows R1-1776 isn't perfect:

Questions in Chinese may still yield censored responses
Factual accuracy may suffer: Some factual information distorted during censorship removal
Censorship isn't just "answer refusal"—bias exists in the training data itself, making complete removal difficult

China Media Project's analysis:

"Removing DeepSeek's gag does not set it free from strictures that are part of its DNA. Ask an uncensored version about Taiwan, and it will repeat Chinese Party-state disinformation, such as that Taiwan has been part of China 'since ancient times.'"

8. Market Shockwaves

DeepSeek R1's release triggered an earthquake in financial markets.

Stock Crash

On January 27, 2025, the day DeepSeek topped the U.S. iOS App Store:

Company	Stock Change	Market Cap Loss
Nvidia	-17%	$600 billion (largest single-company drop in U.S. history)
Microsoft	Decline	Tens of billions
Google	Decline	Tens of billions
All AI stocks	-	Over $1 trillion evaporated

Why This Reaction?

Investor concerns were clear:

"Should we pour billions into AI?": If DeepSeek built a top-tier model for $6M (nominally), are OpenAI/Anthropic's hundreds of millions excessive?

"Are Nvidia GPUs that necessary?": If efficient algorithms work with fewer chips, Nvidia demand could drop

"Is U.S. tech leadership shaking?": If China caught up despite chip regulations, American AI supremacy is threatened

Subsequent Recovery

Markets partially recovered afterward. Analysts noted:

DeepSeek's cost claims were overstated
AI demand remains explosive
Efficient AI could actually accelerate more applications

Nvidia CEO Jensen Huang countered: "If inference demand explodes, more GPUs will be needed."

9. Questions for the AI Industry

DeepSeek R1 posed important questions for the entire AI industry.

Question 1: Limits of Scaling Laws?

Until now, AI progress followed a simple formula: More data + More compute = Better models. This is called "Scaling Laws."

DeepSeek showed an alternative path. Algorithmic innovation can extract more from the same compute. This suggests a new research direction: "efficient scaling."

Question 2: Open Source vs. Closed Source

Model	Release Method
GPT-4, Claude	Closed (API only)
Llama, Mistral	Weights released, some restrictions
DeepSeek R1	Fully open-source (MIT License)

DeepSeek's success proved open-source models can compete with closed models, reigniting debates about AI's future.

Question 3: Do Export Controls Work?

If U.S. chip regulations actually spurred China's efficient innovation, should the strategy be reconsidered? Experts are divided:

Pro-regulation: Without controls, China would have advanced faster
Skeptics: Regulations can't stop innovation and may stimulate it
Middle ground: Regulations need to be paired with accelerating U.S. domestic innovation

Question 4: Democratization or New Risks?

DeepSeek R1's open-source release has two sides:

Positive aspects:

Resource-limited researchers and developers access top-tier AI
Strengthening AI capabilities in the Global South
Increased research transparency

Concerns:

Censorship and bias spreading globally
Potential for misuse (deepfakes, scams, etc.)
Connection to Chinese government (data security concerns)

PRISM Insight

"DeepSeek proved that 'necessity is the mother of invention' remains valid in the 21st century."

The January 2025 DeepSeek shock will be recorded as an important inflection point in AI history. What it showed wasn't simply "China can do it too."

First, resource constraints may not be innovation's enemy. Not unlimited GPUs and funding, but creative algorithms and efficient design can achieve breakthroughs. This is a message of hope for resource-limited researchers.

Second, it reaffirmed the power of open source. When OpenAI said "AI is too dangerous to open," DeepSeek opened everything. The result: acceleration of global AI research.

Third, geopolitical tensions shape technological development. U.S. chip regulations, China's self-reliance efforts, both sides' AI supremacy competition—all are determining AI's future.

However, the censorship issue cannot be overlooked. No matter how excellent the technology, AI that enforces particular viewpoints and erases certain histories cannot be truly "general" intelligence. We must separate DeepSeek's technical achievements from its ethical limitations.

The question ahead: Can we build AI that is both efficient and free?

Glossary

Term	Definition
DeepSeek	Hangzhou-based AI startup founded by quant hedge fund High-Flyer
GRPO	Group Relative Policy Optimization. Efficient RL algorithm developed by DeepSeek
MoE	Mixture of Experts. Efficient architecture activating only some parameters based on input
R1-Zero	DeepSeek's experimental model trained purely through RL without supervised learning
R1-1776	Perplexity's censorship-removed version of DeepSeek R1
Knowledge Distillation	Technique for transferring knowledge from large models to smaller ones
Cold Start	Small amount of high-quality seed data used in R1 training
Chain-of-Thought	AI's step-by-step problem-solving thought process

Update Log

Date	Changes
2026-01-06	Initial publication

This content does not constitute investment advice. When using specific AI services, please review their terms of service, privacy policies, and data security policies.