mdibb.net

Dealing with and preventing Referrer Spam

Recently I started to get a whole load of referrer spam (i.e. where someone has created some client to repeatably access your page with some spam url (usually porn) as the referring site in the hope that someone will stumble over it in your logs and click, or that it will increase their PageRank with Google from all the extra links). Its swamping the logs and hiding all of my real statistics, plus it turns out they've been comming back 2 or 3 times a day and spamming the site for a minute or two before moving on. Makes you wonder why they've chosen little old me with my 0/10 PageRank! Anyway, time to deal with it.



Looking at the raw log files gives me a clear idea what is going on - each day there is a large number of hits for the root index, each with one of a few referring urls. Lets look at an example (I've mangled the URL here obviously, and I've hidden the IP incase some legitimate user gets it again in the near future)

209.123.8.xxx - - [20/Jun/2005:16:59:25 +0100] "GET / HTTP/1.1" 200 1381 "http://gallery.<something>.biz/hentai-movies.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1)"

There are about 800 of those requests just for so far today, each of them using an apparently random user agent string - a quick awk shows us the 39 variations they are using:

"Java/1.4.1_04"
"Mozilla/3.0(compatible)"
"Mozilla/4.0(compatible;MSIE5.0;Windows98;DigExt)"
"Mozilla/4.0(compatible;MSIE5.01;Windows98)"
"Mozilla/4.0(compatible;MSIE5.01;WindowsNT5.0)"
"Mozilla/4.0(compatible;MSIE5.5;Windows98)"
"Mozilla/4.0(compatible;MSIE5.5;Windows98;(R11.3))"
"Mozilla/4.0(compatible;MSIE5.5;Windows98;Win9x4.90)"
"Mozilla/4.0(compatible;MSIE5.5;WindowsNT5.0)"
"Mozilla/4.0(compatible;MSIE6.0;Windows98)"
"Mozilla/4.0(compatible;MSIE6.0;Windows98;.NETCLR1.1.4322)"
"Mozilla/4.0(compatible;MSIE6.0;Windows98;Win9x4.90)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.0)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.0;.NETCLR1)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.0;AT&TCSM6.0)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1;.NETCLR1.0.3705)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1;Copper.Net)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1;FeatExt18)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1;FunWebProd)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1;Opera7.54)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1;SV1)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1;SV1;.NET)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1;SV1;.NETCLR1.1.4322)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1;Wanadoo6.0)"
"Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1;YComp5.0.0.0;SV1;.NETCLR1.0.3705)"
"Mozilla/4.0+(compatible;)"
"Mozilla/5.0(Windows;U;WindowsNT5.1;de-DE;rv:1.7.5;Gecko/2004)"
"Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.4;Gecko/2004)"
"Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.7.2;Gecko/2004)"
"Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.7.5)Gecko/20041107Firefox/1.0"
"Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.7.5;Gecko/2004)"
"Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.7.6)Gecko/20050225Firefox/1.0.1"
"Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.7.6)Gecko/20050317Firefox/1.0.2"
"Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.7.7)Gecko/20050414Firefox/1.0.3"
"Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.7.8)Gecko/20050511Firefox/1.0.4"
"Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.7;Gecko/2004)"
"Mozilla/5.0(Windows;U;WindowsNT5.1;rv:1.7.3;Gecko/2004)"
"Mozilla/5.0+(Windows;+U;+Windows+NT+5.1;+es-ES;+rv:1.7.5)"

One would assume that they are using the random user agents in an effort to prevent filtering based on user agent or at least to look like normal traffic (although what the "Java/1.4.1_04"; one is doing there is anyone's guess). They've also got some semi-normal looking URLs in the referrer field as well to look like normal sites rather than the blatant porn ones, but they just direct to some porn site anyway. The stupid thing is, they've gone to all of this effort with random user agents and loads of different referring sites, but they're using the same IP address every time! Luckily enough my hosting company has the rewrite engine installed, so its pretty simple to forbid this IP from accessing the site again using a simple addition to my .htaccess file:

RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^209.123.8.xxx$
RewriteRule .* - [F,L]

Hopefully that will quieten things down for a while. No doubt someone will be back with a new IP again soon, but it will be easy to just whack another line in the .htaccess file.

Back 23.02.2006.