Chinese AI Bots Are Quietly Harvesting the Global Web
Mysterious bots from Lanzhou, China are flooding websites worldwide, distorting analytics and raising questions about data sovereignty in the AI age.
14.7% of US Government Website Visitors Are From One Chinese City
Alejandro Quintero thought he'd struck gold in China. The Bogotá-based blogger runs a paranormal website written in "Spanglish"—hardly content designed for Asian audiences. But last October, his site suddenly exploded with traffic from China and Singapore, accounting for over half his total visits in the past year.
The celebration was short-lived. Digging into Google Analytics, Quintero discovered something unsettling: all Chinese visitors came from one city—Lanzhou—spent 0 seconds on pages, and never scrolled or clicked. They weren't readers. They were bots.
A Global Bot Invasion Nobody Saw Coming
Quintero wasn't alone. Since September, websites worldwide have reported identical bot swarms from China and Singapore: Indian lifestyle magazines, Canadian island blogs, personal portfolios, a weather platform with 15 million pages, Shopify stores, even US government domains.
The scale is staggering. According to Analytics.usa.gov, 14.7% of US government website visits in the last 90 days came from Lanzhou, with 6.6% from Singapore—making them the world's top two cities supposedly hungry for American government information.
But Lanzhou is a second-tier manufacturing hub in northwest China, not a tech center or data center hotspot. So why there?
The Tencent, Alibaba, Huawei Connection
Gavin King, founder of Known Agents, dug deeper into his own website's bot traffic. The smoking gun: all traffic routes through servers belonging to major Chinese cloud providers. King traced bots to Tencent's Autonomous System Number 132203, while weather site manager "Andy" detected traffic from ASNs linked to Tencent, Alibaba, and Huawei.
Lanzhou might just be a routing waypoint—all traffic eventually passes through Singapore, suggesting a sophisticated infrastructure designed to obscure true origins.
Not Your Typical AI Scrapers
These Chinese bots behave differently from known AI crawlers. First, there are simply way more of them. King reports they account for 22% of his site's traffic, while all other AI bots combined represent less than 10%.
Second, they're deliberately deceptive. Companies like OpenAI and Google clearly identify their bots and respect blocking rules. These bots masqueraded as human users from day one, even bypassing common bot-detection systems.
The Hidden Costs of Bot Traffic
While not explicitly malicious, the bots create real problems. Website owners face higher bandwidth costs as bot traffic crowds out human users. Analytics become meaningless when half your "audience" consists of 0-second visits from Chinese bots.
The biggest impact hits ad-dependent sites. "This is destroying my AdSense strategies," says Quintero, "because they're saying your content isn't valuable to viewers." Sites flooded with bot traffic risk being penalized by advertising algorithms, crushing revenue.
Fighting Back With Makeshift Solutions
Website operators are sharing defense strategies on platforms like Reddit. They've learned these bots often present as old Windows versions with uncommon screen ratios—characteristics that enable group blocking.
Andy blocked four ASNs linked to Chinese cloud providers, reducing daily bot visits from 127,000 to just over 2,000. But it's whack-a-mole—new bot networks keep emerging.
Some operators resort to nuclear options: completely blocking all traffic from China and Singapore. It's effective but crude, potentially cutting off legitimate users.
The Data Sovereignty Question
Why are Chinese entities harvesting global web data at this scale? The leading theory: AI training data collection. As language models grow hungrier for content, the race to scrape the internet intensifies.
But unlike transparent AI companies, these operations work in shadows. They're not asking permission or respecting robots.txt files. They're taking whatever they want from websites that can't effectively fight back.
The internet was built on openness, but that openness is now being weaponized. The question isn't just how to stop these bots—it's whether we can preserve an open web while protecting against those who would exploit it.
This content is AI-generated based on source articles. While we strive for accuracy, errors may occur. We recommend verifying with the original source.
Related Articles
Sarvam AI launches Indus chat app with 105B parameter model, challenging OpenAI and Google in India's booming AI market. Can local expertise beat global scale?
xAI delayed a model release for days to perfect Baldur's Gate responses. What this gaming obsession reveals about AI competition strategies and market positioning.
Anthropic and OpenAI are pouring millions into opposing political campaigns over a single AI safety bill. What this proxy war reveals about the industry's future.
MIT's 2025 report reveals why AI promises fell short, LLM limitations, and what the hype correction means for the future
Thoughts
Share your thoughts on this article
Sign in to join the conversation