PDA

View Full Version : Block Referer Spam via .htaccess?



howard
09-21-2005, 08:10 AM
Would adding the follwing code to an .htaccess file be good way to block referer spam? I have posted the code as text, due to the text box problem I posted about elsewhere on the forum.

--------code:



RewriteEngine on

# Block Referrer Spam

# Drugs / Herbal

RewriteCond %{HTTP_REFERER} (sleep-?deprivation) [NC,OR]
RewriteCond %{HTTP_REFERER} (sleep-?disorders) [NC,OR]
RewriteCond %{HTTP_REFERER} (insomnia) [NC,OR]
RewriteCond %{HTTP_REFERER} (phentermine) [NC,OR]
RewriteCond %{HTTP_REFERER} (phentemine) [NC,OR]
RewriteCond %{HTTP_REFERER} (vicodin) [NC,OR]
RewriteCond %{HTTP_REFERER} (hydrocodone) [NC,OR]
RewriteCond %{HTTP_REFERER} (levitra) [NC,OR]
RewriteCond %{HTTP_REFERER} (hgh-) [NC,OR]
RewriteCond %{HTTP_REFERER} (-hgh) [NC,OR]
RewriteCond %{HTTP_REFERER} (ultram-) [NC,OR]
RewriteCond %{HTTP_REFERER} (-ultram) [NC,OR]
RewriteCond %{HTTP_REFERER} (cialis) [NC,OR]
RewriteCond %{HTTP_REFERER} (soma-) [NC,OR]
RewriteCond %{HTTP_REFERER} (-soma) [NC,OR]
RewriteCond %{HTTP_REFERER} (diazepam) [NC,OR]
RewriteCond %{HTTP_REFERER} (gabapentin) [NC,OR]
RewriteCond %{HTTP_REFERER} (celebrex) [NC,OR]
RewriteCond %{HTTP_REFERER} (viagra) [NC,OR]
RewriteCond %{HTTP_REFERER} (fioricet) [NC,OR]
RewriteCond %{HTTP_REFERER} (ambien) [NC,OR]
RewriteCond %{HTTP_REFERER} (valium) [NC,OR]
RewriteCond %{HTTP_REFERER} (zoloft) [NC,OR]
RewriteCond %{HTTP_REFERER} (finasteride) [NC,OR]
RewriteCond %{HTTP_REFERER} (lamisil) [NC,OR]
RewriteCond %{HTTP_REFERER} (meridia) [NC,OR]
RewriteCond %{HTTP_REFERER} (allegra) [NC,OR]
RewriteCond %{HTTP_REFERER} (diflucan) [NC,OR]
RewriteCond %{HTTP_REFERER} (zovirax) [NC,OR]
RewriteCond %{HTTP_REFERER} (valtrex) [NC,OR]
RewriteCond %{HTTP_REFERER} (lipitor) [NC,OR]
RewriteCond %{HTTP_REFERER} (proscar) [NC,OR]
RewriteCond %{HTTP_REFERER} (acyclovir) [NC,OR]
RewriteCond %{HTTP_REFERER} (sildenafil) [NC,OR]
RewriteCond %{HTTP_REFERER} (tadalafil) [NC,OR]
RewriteCond %{HTTP_REFERER} (xenical) [NC,OR]
RewriteCond %{HTTP_REFERER} (melatonin) [NC,OR]
RewriteCond %{HTTP_REFERER} (xanax) [NC,OR]
RewriteCond %{HTTP_REFERER} (herbal) [NC,OR]
RewriteCond %{HTTP_REFERER} (drugs) [NC,OR]
RewriteCond %{HTTP_REFERER} (lortab) [NC,OR]
RewriteCond %{HTTP_REFERER} (adipex) [NC,OR]
RewriteCond %{HTTP_REFERER} (propecia) [NC,OR]
RewriteCond %{HTTP_REFERER} (carisoprodol) [NC,OR]
RewriteCond %{HTTP_REFERER} (tramadol) [NC]
RewriteRule .* - [F]

# Porn

RewriteCond %{HTTP_REFERER} (porno) [NC,OR]
RewriteCond %{HTTP_REFERER} (shemale) [NC,OR]
RewriteCond %{HTTP_REFERER} (gangbang) [NC,OR]
RewriteCond %{HTTP_REFERER} (-cock) [NC,OR]
RewriteCond %{HTTP_REFERER} (-anal) [NC,OR]
RewriteCond %{HTTP_REFERER} (-orgy) [NC,OR]
RewriteCond %{HTTP_REFERER} (cock-) [NC,OR]
RewriteCond %{HTTP_REFERER} (anal-) [NC,OR]
RewriteCond %{HTTP_REFERER} (orgy-) [NC,OR]
RewriteCond %{HTTP_REFERER} (singles-?christian) [NC,OR]
RewriteCond %{HTTP_REFERER} (dating-?christian) [NC,OR]
RewriteCond %{HTTP_REFERER} (cumeating) [NC,OR]
RewriteCond %{HTTP_REFERER} (cream-?pies) [NC,OR]
RewriteCond %{HTTP_REFERER} (cumsucking) [NC,OR]
RewriteCond %{HTTP_REFERER} (cumswapping) [NC,OR]
RewriteCond %{HTTP_REFERER} (cumfilled) [NC,OR]
RewriteCond %{HTTP_REFERER} (cumdripping) [NC,OR]
RewriteCond %{HTTP_REFERER} (krankenversicherung) [NC,OR]
RewriteCond %{HTTP_REFERER} (********) [NC,OR]
RewriteCond %{HTTP_REFERER} (suckingcum) [NC,OR]
RewriteCond %{HTTP_REFERER} (drippingcum) [NC,OR]
RewriteCond %{HTTP_REFERER} (********) [NC,OR]
RewriteCond %{HTTP_REFERER} (swappingcum) [NC,OR]
RewriteCond %{HTTP_REFERER} (eatingcum) [NC,OR]
RewriteCond %{HTTP_REFERER} (***-) [NC,OR]
RewriteCond %{HTTP_REFERER} (-***) [NC,OR]
RewriteCond %{HTTP_REFERER} (sperm) [NC,OR]
RewriteCond %{HTTP_REFERER} (christian-?dating) [NC,OR]
RewriteCond %{HTTP_REFERER} (jewish-?singles) [NC,OR]
RewriteCond %{HTTP_REFERER} (sex-?meetings) [NC,OR]
RewriteCond %{HTTP_REFERER} (swinging) [NC,OR]
RewriteCond %{HTTP_REFERER} (swingers) [NC,OR]
RewriteCond %{HTTP_REFERER} (personals) [NC,OR]
RewriteCond %{HTTP_REFERER} (sleeping) [NC,OR]
RewriteCond %{HTTP_REFERER} (libido) [NC,OR]
RewriteCond %{HTTP_REFERER} (grannies) [NC,OR]
RewriteCond %{HTTP_REFERER} (mature) [NC,OR]
RewriteCond %{HTTP_REFERER} (enhancement) [NC,OR]
RewriteCond %{HTTP_REFERER} (sexual) [NC,OR]
RewriteCond %{HTTP_REFERER} (gay-?teen) [NC,OR]
RewriteCond %{HTTP_REFERER} (teen-?chat) [NC,OR]
RewriteCond %{HTTP_REFERER} (gay-?chat) [NC,OR]
RewriteCond %{HTTP_REFERER} (adult-?finder) [NC,OR]
RewriteCond %{HTTP_REFERER} (adult-?friend) [NC,OR]
RewriteCond %{HTTP_REFERER} (friend-?finder) [NC,OR]
RewriteCond %{HTTP_REFERER} (friend-?adult) [NC,OR]
RewriteCond %{HTTP_REFERER} (finder-?adult) [NC,OR]
RewriteCond %{HTTP_REFERER} (finder-?friend) [NC,OR]
RewriteCond %{HTTP_REFERER} (discrete-?encounters) [NC,OR]
RewriteCond %{HTTP_REFERER} (cheating-?wives) [NC,OR]
RewriteCond %{HTTP_REFERER} (housewives) [NC,OR]
RewriteCond %{HTTP_REFERER} (\-sex\.) [NC,OR]
RewriteCond %{HTTP_REFERER} (xxx) [NC,OR]
RewriteCond %{HTTP_REFERER} (snowballing) [NC]
RewriteRule .* - [F]

# Weight

RewriteCond %{HTTP_REFERER} (fat-) [NC,OR]
RewriteCond %{HTTP_REFERER} (-fat) [NC,OR]
RewriteCond %{HTTP_REFERER} (diet) [NC,OR]
RewriteCond %{HTTP_REFERER} (pills) [NC,OR]
RewriteCond %{HTTP_REFERER} (weight) [NC,OR]
RewriteCond %{HTTP_REFERER} (supplement) [NC]
RewriteRule .* - [F]

# Gambling

RewriteCond %{HTTP_REFERER} (texas-?hold-?em) [NC,OR]
RewriteCond %{HTTP_REFERER} (poker) [NC,OR]
RewriteCond %{HTTP_REFERER} (casino) [NC,OR]
RewriteCond %{HTTP_REFERER} (blackjack) [NC]
RewriteRule .* - [F]

# Loans / Finance

RewriteCond %{HTTP_REFERER} (mortgage) [NC,OR]
RewriteCond %{HTTP_REFERER} (refinancing) [NC,OR]
RewriteCond %{HTTP_REFERER} (cash-?advance) [NC,OR]
RewriteCond %{HTTP_REFERER} (cash-?money) [NC,OR]
RewriteCond %{HTTP_REFERER} (pay-?day) [NC]
RewriteRule .* - [F]

# User Agents

RewriteCond %{HTTP_USER_AGENT} (Program\ Shareware|Fetch\ API\ Request) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Microsoft\ URL\ Control) [NC]
RewriteRule .* - [F]

# Misc / Specific Sites

RewriteCond %{HTTP_REFERER} (netwasgroup\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (nic4u\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (wear4u\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (foxmediasolutions\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (liveplanets\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (aeterna-tech\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (continentaltirebowl\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (chemsymphony\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (infolibria\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (globaleducationeurope\.net) [NC,OR]
RewriteCond %{HTTP_REFERER} (soma\.125mb\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (mitglied\.lycos\.de) [NC,OR]
RewriteCond %{HTTP_REFERER} (foxmediasolutions\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (jroundup\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (feathersandfurvanlines\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (conecrusher\.org) [NC,OR]
RewriteCond %{HTTP_REFERER} (sbj-broadcasting\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (edthompson\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (codychesnutt\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (artsmallforsenate\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (axionfootwear\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (protzonbeer\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (candiria\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (bigsitecity\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (coresat\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (istarthere\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (amateurvoetbal\.net) [NC,OR]
RewriteCond %{HTTP_REFERER} (alleghanyeda\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (xadulthosting\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (datashaping\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (zick\.biz) [NC,OR]
RewriteCond %{HTTP_REFERER} (newprinceton\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (dvdsqueeze\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (xopy\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (webdevboard\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (devaddict\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (eaton-inc\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (whiteguysgroup\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (guestbookz\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (webdevsquare\.com) [NC,OR]
RewriteCond %{HTTP_REFERER} (indfx\.net) [NC,OR]
RewriteCond %{HTTP_REFERER} (snap\.to) [NC,OR]
RewriteCond %{HTTP_REFERER} (2y\.net) [NC,OR]
RewriteCond %{HTTP_REFERER} (astromagia\.info) [NC,OR]
RewriteCond %{HTTP_REFERER} (jixx\.de) [NC,OR]
RewriteCond %{HTTP_REFERER} (free-?sms) [NC]
RewriteRule .* - [F]

wildjokerdesign
09-21-2005, 10:15 AM
I assume you mean to block getting post on a forum or other page that accepts input via a form right? I guess it would work but it seems like it might affect the speed of the site. Every request for a page is going to have to be filtered through it. I don't know enough about it to say for sure. I am not even sure how the first part would capture anything since as I understand it HTTP_REFERER would hold where someone clicked on a link from anouther site. Now if it was checking QUERY_STRING then maybe but it still does not make complete sense to me. Like I say this is one of my weak areas.

howard
09-21-2005, 11:09 AM
Thanks Shawn, one of my weak areas as well. I will wait for mroe replies.
Howard

allyn
09-21-2005, 11:14 AM
what do you mean by referer spam?

if you mean comment spam, i would be surprised if spammers send valid referer information. and without it this technique is ineffectve.

howard
09-21-2005, 11:33 AM
Shawn and Allyn,

It's described well here:
http://www.spywareinfo.com/articles/referer_spam/

Howard

wildjokerdesign
09-22-2005, 08:23 AM
Ok I see. Good article. I am not sure I like his idea of sending them on to anouther site but giveing them a forbidden is a great idea. I don't use any such type listing of my access logs so not sure if it is worth the time to set up but I do wonder if by sending the forbidden it would cut back on the bandwidth used.

howard
09-22-2005, 09:38 AM
Rather than needing to look at the logs to find out which specific site to send the 403 to, there is a script which will log the unwanted visitors. I am working on testing it. Define unwanted? Make a subdirectory to which you deny bots access via a robots.txt exclusion rule. Place the bot detector script in that folder - voila! The only folks who should be listed in the output file are those who don't obey the robots exclusion. I may post it if I find it works well.

I find it interesting that the largest user of bandwidth to one of my sites during this month is the MSN-Bot (this, from a Webalizer table, as text, so it's hard to read, but you get the idea):

Top 10 of 504 Total Sites By KBytes

# Hits Files KBytes Visits Hostname

1 887 12.98% 854 14.82% 6143 8.00% 89 9.12% msnbot.msn.com
2 44 0.64% 44 0.76% 3819 4.98% 1 0.10% 222.66.55.110
3 17 0.25% 17 0.30% 3612 4.71% 1 0.10% user128.res146.jtibs.net
4 25 0.37% 22 0.38% 3028 3.94% 1 0.10% 62.225.78.130
5 15 0.22% 13 0.23% 2985 3.89% 1 0.10% amarseille-152-1-41-179.w81-251.abo.wanadoo.fr
6 15 0.22% 12 0.21% 2967 3.87% 1 0.10% ip32.gte154.dsl-acs2.sea.iinet.com

SJP
09-24-2005, 10:23 PM
I've never looked at the logs once, but now this thread and the referenced document has gotten my interest. Anyone know where the Web server files are kept?

Thanks.

howard
09-25-2005, 03:28 AM
SJP,

On a vps account, they are at:
/var/log/httpd/

You'll find:

referer.log
access.log
error.log

And their .gz backups, assuming your account is set to do those backups at a certain interval or file size.

The error.log is most helpful in solving internal server error problems.

Howard

SJP
09-25-2005, 02:13 PM
Much thanks.

SJP
11-03-2005, 10:44 PM
Rather than needing to look at the logs to find out which specific site to send the 403 to, there is a script which will log the unwanted visitors. I am working on testing it. Define unwanted? Make a subdirectory to which you deny bots access via a robots.txt exclusion rule. Place the bot detector script in that folder - voila! The only folks who should be listed in the output file are those who don't obey the robots exclusion. I may post it if I find it works well.


Can the .htaccess file be modified on the fly and the changes put into effect immediately? For instance: only sites that either accessed the robots.txt or index files first would be permitted. The two files would actually be scripts and their output would be the actual file their name represented, but because they were executable they'd be able to retrieve the remote IP of the requestor, write the .htaccess file to permit them through and everyone else that didn't use the "front door" would always be told files were not found.

The article on referrer spam writes:

Spammers decide which sites to spam by checking sites automatically such as blo.gs, weblogs.com, and popdex.com for blogs that have updated recently. They also may do a simple Google search for the phrase "recent referers".

My comment:

Having written my website software I'm not using any packages anyone out there in spammer land knows to look for or to even recognize my function. This has been a great boon to defeating spam. I hardly get any. Maybe two a day and I've been on the net for over three years. That's certainly lopsided when simple resume type sites are getting hundreds!

This measure would also alleviate WJD concerns about the web server having to parse a giant list of exceptions everytime an http request was made. Instead the list would only include the IP addresses of the sites that were allowed.

Of course there is one drawback. SE typically like to list pages that belong to a website, but are not necessarily the begining page. This would cause a problem, because the person who tried to get at the file directly would be treated as a spammer.

However this would prevent the scanning of pages (email harvesting) and loading random images as a way of propagating referrer spam. And it might even make a website somewhat invisible to unscrupulous users as I have learned through my own experience.

SJP

howard
11-04-2005, 08:14 AM
Can the .htaccess file be modified on the fly and the changes put into effect immediately?
Yes. That is discussed here:
www.webmasterworld.com/forum23/1281.htm

A g-search on "bot trap" yields this link:
http://www.kloth.net/internet/bottrap.php

Stuck between doing it manually, using an inordinately long .htaccess file, and automating the process. Why I haven't posted further on the topic.

Howard