Liabooks Home|PRISM News
Chinese AI Bots Are Quietly Harvesting the Global Web
TechAI Analysis

Chinese AI Bots Are Quietly Harvesting the Global Web

4 min readSource

Mysterious bots from Lanzhou, China are flooding websites worldwide, distorting analytics and raising questions about data sovereignty in the AI age.

14.7% of US Government Website Visitors Are From One Chinese City

Alejandro Quintero thought he'd struck gold in China. The Bogotá-based blogger runs a paranormal website written in "Spanglish"—hardly content designed for Asian audiences. But last October, his site suddenly exploded with traffic from China and Singapore, accounting for over half his total visits in the past year.

The celebration was short-lived. Digging into Google Analytics, Quintero discovered something unsettling: all Chinese visitors came from one city—Lanzhou—spent 0 seconds on pages, and never scrolled or clicked. They weren't readers. They were bots.

A Global Bot Invasion Nobody Saw Coming

Quintero wasn't alone. Since September, websites worldwide have reported identical bot swarms from China and Singapore: Indian lifestyle magazines, Canadian island blogs, personal portfolios, a weather platform with 15 million pages, Shopify stores, even US government domains.

The scale is staggering. According to Analytics.usa.gov, 14.7% of US government website visits in the last 90 days came from Lanzhou, with 6.6% from Singapore—making them the world's top two cities supposedly hungry for American government information.

But Lanzhou is a second-tier manufacturing hub in northwest China, not a tech center or data center hotspot. So why there?

The Tencent, Alibaba, Huawei Connection

Gavin King, founder of Known Agents, dug deeper into his own website's bot traffic. The smoking gun: all traffic routes through servers belonging to major Chinese cloud providers. King traced bots to Tencent's Autonomous System Number 132203, while weather site manager "Andy" detected traffic from ASNs linked to Tencent, Alibaba, and Huawei.

Lanzhou might just be a routing waypoint—all traffic eventually passes through Singapore, suggesting a sophisticated infrastructure designed to obscure true origins.

Not Your Typical AI Scrapers

These Chinese bots behave differently from known AI crawlers. First, there are simply way more of them. King reports they account for 22% of his site's traffic, while all other AI bots combined represent less than 10%.

Second, they're deliberately deceptive. Companies like OpenAI and Google clearly identify their bots and respect blocking rules. These bots masqueraded as human users from day one, even bypassing common bot-detection systems.

The Hidden Costs of Bot Traffic

While not explicitly malicious, the bots create real problems. Website owners face higher bandwidth costs as bot traffic crowds out human users. Analytics become meaningless when half your "audience" consists of 0-second visits from Chinese bots.

The biggest impact hits ad-dependent sites. "This is destroying my AdSense strategies," says Quintero, "because they're saying your content isn't valuable to viewers." Sites flooded with bot traffic risk being penalized by advertising algorithms, crushing revenue.

Fighting Back With Makeshift Solutions

Website operators are sharing defense strategies on platforms like Reddit. They've learned these bots often present as old Windows versions with uncommon screen ratios—characteristics that enable group blocking.

Andy blocked four ASNs linked to Chinese cloud providers, reducing daily bot visits from 127,000 to just over 2,000. But it's whack-a-mole—new bot networks keep emerging.

Some operators resort to nuclear options: completely blocking all traffic from China and Singapore. It's effective but crude, potentially cutting off legitimate users.

The Data Sovereignty Question

Why are Chinese entities harvesting global web data at this scale? The leading theory: AI training data collection. As language models grow hungrier for content, the race to scrape the internet intensifies.

But unlike transparent AI companies, these operations work in shadows. They're not asking permission or respecting robots.txt files. They're taking whatever they want from websites that can't effectively fight back.

The internet was built on openness, but that openness is now being weaponized. The question isn't just how to stop these bots—it's whether we can preserve an open web while protecting against those who would exploit it.

This content is AI-generated based on source articles. While we strive for accuracy, errors may occur. We recommend verifying with the original source.

Thoughts

Related Articles