Who's Hunting Google and Baidu's Origin Servers? A Real-World Host Header Recon Analysis

Bots don’t just attack -- they map. This dataset, collected via an internet telescope-style sensor, captures structured reconnaissance activity where attackers manipulate HTTP Host headers to probe infrastructure across major internet platforms. Rather than targeting a specific application, this traffic represents pre-attack intelligence gathering, attempting to: Discover origin servers behind CDNs and WAFs Identify misconfigured virtual hosts Test routing behavior across edge infrastructure Map backend exposure paths Each request is an IOC (Indicator of Compromise) reflecting how automated systems prepare for deeper exploitation.

There's a category of internet noise that most people ignore — and that's exactly why it's dangerous. What we captured on the SAYOR.net internet telescope isn't vulnerability exploitation. It's something more patient: automated reconnaissance at scale, where bots probe random internet-facing servers by injecting forged HTTP Host headers referencing some of the biggest names in tech. Google. Baidu. Cloudflare. Microsoft. The question isn't why these names appear in our logs — it's what the attackers expect to find when they show up.

This post breaks down every category of host header abuse we observed, attributes probers to hosting providers and geographies using WHOIS and ASN data, and explains what the attacker is actually trying to accomplish.

What Is a Host Header Attack, and Why Does It Matter Here?

When an HTTP/1.1 request arrives at a server, it carries a Host header that tells the server which virtual host the client wants. In a normal request to google.com, that header reads Host: google.com. Attackers abuse this by sending requests to arbitrary internet servers with a forged Host header — for example, telling a random VPS in Germany to respond as if it were google.com's backend.

Why would that be useful? Because many large platforms sit behind CDNs, WAFs, and load balancers. The actual origin servers — the machines doing the real computation — are often not directly exposed. If an attacker can find a server that responds to Host: google.com in a way that looks like an origin response, they've potentially discovered a direct-to-origin path, bypassing every protective layer in front of it. That's what this scanning activity is probing for.

Our sensor doesn't sit behind Google. It sits on the open internet, passively listening. The fact that it received these probes is not because anyone thought it was Google — it's because scanners throw forged Host headers at every IP they can reach, looking for the one that slips up.

Category 1 — Big Tech Origin Discovery: Google and Baidu

This is the most significant cluster in the dataset and the one with the clearest geopolitical signal.

Probing google.com:443

95.111.253.31 → google.com:443
185.249.225.89 → google.com:443
194.165.17.13 → google.com:443
36.255.98.221 → google.com:443
204.76.203.10 → google.com:443
138.124.14.152 → google.com:443
103.252.91.84 → google.com:443
138.121.198.109 → google.com:443
176.65.148.74 → google.com:443
152.42.200.86 → google.com:443
139.59.110.5 → google.com:443
165.245.179.4 → google.com:443
45.205.1.50 → google.com:443
176.65.149.215 → google.com:443
45.135.194.20 → google.com:443

ASN/Geo Attribution:

95.111.253.31 resolves to Contabo GmbH, Munich, Germany (AS51167) — one of the most frequently abused bulk VPS providers in Europe. Contabo's infrastructure is a known launchpad for scanning campaigns because of cheap, no-questions-asked provisioning. The 185.249.225.89, 138.124.x.x, and 176.65.x.x blocks also fall within European commercial hosting ranges commonly seen in reconnaissance datasets. 204.76.203.x is a US-hosted VPS range. 139.59.110.5 is a DigitalOcean Bangalore node (DO-BOT-IND range), and 152.42.200.86 resolves to DigitalOcean's Singapore cluster.

What's being attempted: Every one of these IPs is trying to reach a server that will respond to Host: google.com:443 as an origin would — exposing Google's actual backend IP or a misconfigured upstream that routes google.com traffic internally. This is a known precursor to direct-to-origin DDoS and WAF bypass attacks. The goal isn't to attack Google directly from here — it's to find a misconfigured server somewhere on the internet that leaks origin behavior.

Probing www.baidu.com:443

115.238.44.234 → www.baidu.com:443
202.107.226.5 → www.baidu.com:443
23.235.176.50 → www.baidu.com:443
61.228.211.72 → www.baidu.com:443
114.25.99.76 → www.baidu.com:443
58.240.112.150 → www.baidu.com:443
61.228.202.104 → www.baidu.com:443

ASN/Geo Attribution:

115.238.44.234 is a Hangzhou, China IP (China Telecom Zhejiang Province, AS4134). 202.107.226.5 also maps to China Telecom. 61.228.x.x and 114.25.x.x are Taiwanese IP ranges (Chunghwa Telecom, AS3462). 58.240.x.x is China Unicom Jiangsu.

The geopolitical pattern here is real. Probing for Baidu's origin servers is dominated almost entirely by East Asian IP space — Chinese mainland and Taiwan. The google.com probers lean European and global cloud-hosted. This is not a coincidence. East Asian botnet infrastructure and scanning campaigns naturally target East Asian CDN-protected services. Whether these IPs are compromised residential nodes, VPS bots, or deliberate scanning infrastructure, the geographic clustering with the target domain is consistent across the dataset.

Category 2 — CDN Behavior Mapping: Cloudflare and Google Speed

35.200.210.103 → speed.cloudflare.com
35.200.184.180 → speed.cloudflare.com
34.47.160.27 → speed.cloudflare.com
138.124.96.105 → cloudflare.com
83.142.209.181 → cloudflare.com:443

The 35.200.x.x range is a well-known GCP block — specifically Google Cloud Platform's asia-south1 region (Mumbai). Three separate GCP nodes probing speed.cloudflare.com suggests either compromised compute instances running as bots, or misconfigured automation that's firing connection checks through forged Host headers. speed.cloudflare.com is Cloudflare's own network speed testing endpoint — probing it via Host header injection is a way to map CDN edge routing logic and response timing differences that could expose backend routing inconsistencies.

The 83.142.209.181 → cloudflare.com:443 probe is European hosting infrastructure attempting the same against Cloudflare's primary domain.

Category 3 — OS Connectivity Check Abuse: Microsoft NCSI and Google CaptivePortal

78.138.177.80 → www.msftncsi.com:443
54.162.150.202 → www.msftncsi.com:443
198.46.146.122 → connectivitycheck.gstatic.com:443
69.165.72.109 → connectivitycheck.gstatic.com:443

msftncsi.com (Microsoft Network Connectivity Status Indicator) and connectivitycheck.gstatic.com are endpoints that Windows and Android devices use to verify internet connectivity. They're trusted, whitelisted, and generate near-zero suspicion in most network monitoring stacks.

54.162.150.202 is Amazon EC2 us-east-1 (AWS Virginia). 78.138.177.80 is European hosting. 198.46.146.122 and 69.165.72.109 are US VPS nodes. The fact that these cloud-hosted IPs are probing connectivity-check endpoints via forged Host headers suggests they're trying to blend into baseline traffic — standard evasion behavior designed to slip past IDS/IPS rules that whitelist system-level connectivity destinations.

Category 4 — HTTP Testing Infrastructure and Proxy Validation: httpbin.org

176.65.149.48 → httpbin.org
121.91.235.186 → httpbin.org
102.165.26.200 → httpbin.org
87.121.84.172 → httpbin.org

httpbin.org is a public HTTP request and response service used by developers. But attackers use it too — because it reflects request headers back verbatim, making it an ideal tool for validating whether a proxy or forged Host header is being passed through correctly or stripped by intermediary infrastructure.

These probes likely come from automated exploit frameworks validating their own payload delivery chains. If a bot can send Host: httpbin.org and get a reflection, it knows the target server doesn't strip Host headers before proxying — which means the server might be exploitable for SSRF or header injection at other targets.

Category 5 — Anomalous Targets: S3, Codeforces, Roblox, Sogou

90.148.155.161 → test.s3.amazonaws.com
185.249.225.89 → codeforces.com:443
23.95.130.27 → codeforces.com:443
204.76.203.87 → www.roblox.com:443
45.194.92.11 → www.roblox.com:443
43.133.146.123 → www.sogou.com
64.112.74.158 → authorit.sa.com

test.s3.amazonaws.com is a classic SSRF testing target — attackers probe this to check if a server will make upstream connections to AWS S3 on their behalf, which would confirm a server-side request forgery vulnerability. 90.148.155.161 is a European residential or VPS IP.

www.sogou.com (China's major search engine) probed from 43.133.146.123 — this is a Tencent Cloud IP (Hong Kong region, AS132203). Chinese cloud infrastructure probing a Chinese search engine's origin via forged Host headers is consistent with competitive intelligence gathering or vulnerability research originating from within the same regional ecosystem.

www.roblox.com probes from 204.76.203.87 and 45.194.92.11 — both US VPS ranges — likely represent gaming platform origin discovery or DDoS preparation targeting a high-traffic, CDN-protected platform.

authorit.sa.com — this is the most unusual target. It appears to be a Saudi Arabian (.sa) authority domain. 64.112.74.158 is a US-based IP. This probe stands out as potentially targeted reconnaissance rather than broad-spectrum scanning.

Category 6 — Null Routing and Bogon Host Headers: 0.0.0.0, 1.1.1.1, and Private Ranges

34.151.206.25 → 0.0.0.0
187.110.175.205 → 0.0.0.0
51.38.90.52 → 0.0.0.0
177.190.67.44 → 0.0.0.0
46.101.36.170 → 0.0.0.0
213.209.143.73 → 1.1.1.1:443
176.117.107.96 → 1.1.1.1:443
5.62.63.190 → 100.100.100.200
90.148.155.161 → 100.100.100.200
5.62.63.190 → 100.88.222.5

0.0.0.0 as a Host header is either scanner default-fill behavior or a deliberate probe for servers with a wildcard/default virtual host configured — which would respond to any Host value. If a server responds to Host: 0.0.0.0, it's almost certainly misconfigured, and its default vhost content may expose internal services, admin panels, or staging environments.

1.1.1.1:443 — probing Cloudflare's public DNS resolver via Host header is a check for whether the responding server is a Cloudflare proxy edge node. If it responds as one, the attacker knows they're looking at CDN infrastructure rather than an origin.

100.100.100.200 is Alibaba Cloud's internal metadata service address (the AWS equivalent of 169.254.169.254 in Alibaba's ecosystem). Probing this as a Host header from IPs like 5.62.63.190 (UK hosting range, Porkbun LLC adjacent blocks) and 90.148.155.161 (European VPS) suggests SSRF testing specifically against servers that might be running on Alibaba Cloud infrastructure — if the server makes an internal request to the metadata service, the attacker can potentially harvest IAM credentials.

100.88.222.5 falls within AWS's internal VPC connectivity range. Same attacker (5.62.63.190), same playbook — probe for SSRF against cloud metadata endpoints by injecting them as Host headers.

**Geo-Attacker Pattern Summary**
Target Domain	Dominant Probing Region	Likely Infrastructure
google.com:443	Europe (DE, NL), US VPS, SEA cloud	Contabo, DigitalOcean, bulk VPS
www.baidu.com:443	China Mainland, Taiwan	China Telecom, Chunghwa Telecom
speed.cloudflare.com	India (GCP asia-south1)	Google Cloud compromised/misconfigured VMs
www.msftncsi.com	US East (AWS EC2), Europe	AWS, European hosting
httpbin.org	Europe, Africa VPS	Bulk hosting
100.100.100.200	UK/EU VPS	SSRF tooling

The East Asian clustering around Baidu and the European/global clustering around Google holds across the entire dataset. This is a meaningful signal. It doesn't mean nation-state actors — it more likely reflects botnet geographic distribution and the natural affinity of regional scanning infrastructure toward regional targets. But it's a pattern worth tracking.

What Should Defenders Do With This?

First, validate your server's Host header handling. If your reverse proxy (nginx, Apache, Caddy) has a default virtual host configured, test what it returns for unknown Host values. It should return nothing useful — a 444 in nginx, or a deliberate stub.

Second, if you're behind a CDN or WAF, ensure your origin server is not directly reachable on port 443 from the public internet. Whitelist only your CDN provider's IP ranges.

Third, flag inbound Host headers containing domains you don't own. If google.com appears in your access logs as a Host header, something has gone wrong — either a misconfigured client or an active probe. Neither should reach your application layer.

Fourth, treat cloud metadata IP ranges (100.100.100.200, 169.254.169.254, 100.88.222.5) as high-severity Host header values in your WAF. Any server-side code that makes outbound connections using user-supplied Host values is a live SSRF risk.

Final Assessment

This dataset represents automated pre-attack infrastructure mapping, not targeted exploitation. The actors here are building a map — finding which servers on the internet leak origin behavior, accept arbitrary Host headers, or could be weaponized as SSRF stepping stones. The real attacks come later, once that map is drawn.

The geographic patterns are consistent, the tooling is varied, and the targets are deliberate. If your server appears in logs like these, you're already on someone's list.

IOC feed and live telescope data: https://sayor.net/host%20header%20attacks.php

Intelligence sourced from the SAYOR.net internet telescope — a passive sensor capturing and categorizing real-world HTTP reconnaissance targeting internet infrastructure.