Referrer Spam

What is it ?

Referrer Spam may also be known as "access_log spam", "stat spam", or other names. Referrer Spam appears as if your site has been linked to from sites like "incest-taboo.com", "asstraffic.biz" or "rape-stories.biz", and they are sending a lot of traffic to your site. If you use a web-based statisitics program like Webalizer or Urchin, then they may appear as your highest sources of traffic.

The problems are many. First, your church group might not want to appear that they are getting so much traffic from porn sites. Second, it skews your view of your real stats from real visitors. Third, the spammer's bot can consume a ton of bandwidth, which may cost you money.

Figure 1: Two Spam Bots consuming big bandwidth from a personal site:


Figure 2: Gee, I wonder what days the robot hit ?


Figure 3: What time did the Bot visit ?


Figure 4: Why did last month's stats get so popular this month ?

Why does it happen ?

As with e-mail spam, someone is payng the spammers for "advertising" or "spamvertising".

How to stop it ?

Option 1: Use your firewall.

Using your firewall is going to be the the method that will require the least amount of resources from your servers.

With IPTables (Linux):
	iptables -A INPUT -p tcp -s 64.124.222.172 -j REJECT 
With PF (BSD):
	block in quick inet from 64.124.222.172 to any
The Windows 2000 and XP built-in "IP Filtering" can't be configured to block single addresses. Use a network-based firewall product or another OS.

Option 2: mod_rewrite or equivalent

Here are some examples on how to use Apache's Rewrite Engine (mod_rewrite). The below example will take any request with a HTTP_REFERER field of any of the domains below and return a "Forbidden" to the requestor.

	RewriteCond %{HTTP_REFERER} allinternal\.biz [OR]
	RewriteCond %{HTTP_REFERER} djhits\.com [OR]
	RewriteCond %{HTTP_REFERER} asstraffic\.biz [OR]
	RewriteCond %{HTTP_REFERER} ass-traffic\.biz [OR]
	RewriteCond %{HTTP_REFERER} drtushy\.biz [OR]
	RewriteCond %{HTTP_REFERER} "-cartoon" [OR]
	RewriteCond %{HTTP_REFERER} "-sex" [OR]
	RewriteCond %{HTTP_REFERER} "-naked" [OR]
	RewriteCond %{HTTP_REFERER} "incest-" [OR]
	RewriteCond %{HTTP_REFERER} "teen-" [OR]
	RewriteCond %{HTTP_REFERER} "xxx" [OR]
	RewriteCond %{HTTP_REFERER} "-rape" [OR]
	RewriteCond %{HTTP_REFERER} "-stories" [OR]
	RewriteCond %{HTTP_REFERER} "hardcore"
	RewriteRule .* - [F,L] 

Option 3: Block requests from known spam bots with your web server

While option 2 above will block requests with certain characteristics, it may prove easier to just block the source of the requests if they seem to come from only a few sources.

Apache:
Use the following directives. As with any other Apache Directives, this can be applied to directories, virtual hosts, or the whole darn server.
	Order Allow,Deny
	Allow from all
	Deny from sys53.3fe.net 64.124.222.172 anotherspammer.biz
Microsoft IIS:
See this screen capture from of the appropriate place to deny access to your server from specific hosts.

What can you do ?

The robots.txt file

While this method does not actually prevent the spam bots from hammering your site, it does eliminate one of the intended results of spamming your stats. All reputable search engines obey the robots.txt file. By instructing robots (Search Engine crawlers) to not crawl your stats pages, they will not see the links to the spammers sites.

Password protect your statistics pages

Crawlers, and robots can't see your stats if they are password protected. They will not attempt to log in to crawl the pages. If you want people to be able to see your stats, tell your users what the password is on the page that links to your stats.

Complain to the ISPs that host the bots

Usually abuse@example.com is a valid e-mail address for such a thing. Perhaps even call them on the telephone. Tell them what is happening. Tell them you feel their customer is abusing your site, costing you money and is in violation of the spirit of their AUP (Acceptable Use Policy). At this time, this kind of spam is relatively new and most AUP's address abusing behavior and e-mail spam specifically, but not this kind of spam specifically.

Pre-process the logs

While no sctipt yet exists for this purpose, if one were to remove the offending entries in their access_log before running Webalizer, one could remove the log entries which wouyld eventually turn into links to the web sites of the spam sponsors. Contributions and/or links welcome !

Other Resources