Learn how to detect and block bad bot traffic using IP addresses, IP ranges, and user agent analysis. Protect your website from scraping, attacks, and distorted analytics with server-level and Cloudflare rules.

How to Identify and Block Bad Bot Traffic by IP Range and User Agent

Web traffic isn’t just humans anymore. Bots — both good and bad — crawl websites at massive scale. Search engines, monitoring tools, and uptime checkers are helpful. But bad bots — scraping content, harvesting emails, or launching automated attacks — can harm performance, skew analytics, and expose security weaknesses.

In this article, we’ll explore how to identify bad bot traffic using:

  • IP addresses and IP ranges
  • User agent strings

And how to block them at the server or application level.

Why Bot Traffic Matters

Before we dive into code and tools, it’s important to understand the impact of bad bot traffic:

  • Skewed analytics and performance metrics
  • Server load and increased hosting costs
  • Content scraping and intellectual property theft
  • Credential stuffing and automated attacks

Many website owners are now proactively taking stronger steps against automated activity. For example, a growing number of U.S. site operators have begun blocking non‑U.S. traffic in an effort to reduce bot activity and large‑scale scraping. A detailed analysis of this trend can be found here:
https://www.searchen.com/2025/04/03/u-s-website-owners-increasingly-blocking-non-u-s-traffic-to-combat-bot-activity-and-scraping/

Step 1 — Detecting Bot Traffic

Check Your Analytics

Start with tools like Google Analytics 4:

  • Look for sudden spikes in sessions
  • High bounce rates with long visit durations
  • Unusual referral sources

These patterns often signal non‑human traffic.

Analyze Server Logs

Raw logs contain client IP and user agent information:

123.45.67.89 - - [09/Feb/2026:13:45:01 +0000] "GET /index.html HTTP/1.1" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

In this example:

  • 123.45.67.89 is the client IP
  • The user agent identifies Googlebot

Logs are one of the most reliable sources to determine traffic patterns.

Step 2 — Identify Suspicious IP Addresses and Ranges

Bots often originate from predictable IP blocks. You can detect these by:

Reverse DNS Lookup

Use tools like dig or online lookup to see if an IP resolves to a known provider:

dig -x 123.45.67.89 +short

If the domain is generic or known for proxy/VPN services, it could be a bot.

Consult IP Reputation Databases

Services like IPinfo, Spamhaus, or Threat Intelligence lists maintain reputations for IP blocks.

Step 3 — Identify Suspicious User Agents

Bots usually identify themselves with non‑standard user agent strings. Some common patterns include:

  • bot
  • crawler
  • spider
  • scraper

Step 4 — Block Bad Traffic by IP and User Agent

Nginx Example (IP Range Block)

# Block entire bad IP range
deny 123.45.67.0/24;
deny 98.76.54.123;

Nginx Example (User Agent Block)

if ($http_user_agent ~* (bot|crawler|spider|scraper)) {
    return 403;
}

Apache Example (mod_rewrite)

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (bot|crawler|spider|scraper) [NC]
RewriteRule .* - [F,L]

Blocking in Cloudflare

Cloudflare allows:

  • IP Access Rules
  • Bot Fight Mode
  • Custom Firewall rules

Step 5 — Monitor and Adjust

Bad bots evolve constantly. Monitoring is essential:

  • Watch for new IP spikes
  • Log denied requests
  • Update patterns as needed

Automated tools like fail2ban or commercial services can help.

Conclusion

Identifying bad bot traffic by IP range and user agent empowers you to clean up analytics, protect content, and secure your site. With server‑level rules or cloud firewall policies, you can mitigate the majority of automated abuse.


Sponsors