Thread: Spider Trap Anyone? |
|

November 17th, 2008, 04:39 PM
|
|
Comfortably Numb
Join Date: October 17th, 2005
Location: Bayou Country, LA
Posts: 3,431
|
|
I have been using a nice little php Spider Trap that I would be willing to share here if anyone does not have one and feels the need. It gives me great joy when it emails me that an IP has been blocked and the website it was blocked from.
It works when the bad bots follow a link on a 1px x 1px image that triggers the script. The script adds the bad IP to .htaccess and sends you the happy news.
You block your script in robots.txt a few days prior to setting it up so that bots that honor robots.txt are not blocked. It's really very simple to add to your pages.
__________________
|

November 17th, 2008, 05:01 PM
|
|
Member
Join Date: October 11th, 2008
Location: Missouri USA
Posts: 69
|
|
would love to see that code John...
|

November 17th, 2008, 05:14 PM
|
|
Moderator
Join Date: April 6th, 2006
Posts: 2,402
|
|
I would LOVE to see that... I spend more than an hour each week, reviewing logs & banning nasty little buggers that slow down my sites!
|

November 17th, 2008, 05:22 PM
|
|
Believe
Join Date: August 14th, 2006
Location: Dayton, Ohio
Posts: 1,815
|
|
Doesn't a trap put the bot into a loop on your site which would take up resources?
|

November 17th, 2008, 05:56 PM
|
|
ABW Ambassador
Join Date: January 17th, 2005
Location: Springfield, Missouri, USA
Posts: 3,206
|
|
Sure, John, I'd be interested in getting it.
Not exactly sure how to use it, but it sounds interesting. Long as it wouldn't hit my resources like knight01 mentioned.
You've got my email address if you wanted to send it attached to a sort of "how-to" email.
__________________
Generate more fake news.
|

November 17th, 2008, 07:11 PM
|
|
Comfortably Numb
Join Date: October 17th, 2005
Location: Bayou Country, LA
Posts: 3,431
|
|
Quote:
|
Originally Posted by knight01
Doesn't a trap put the bot into a loop on your site which would take up resources?
|
I haven't seen this and have been using this trap on all my sites for way more than a year.
I'll try to explain it here. The first thing you will need to do is decide on a file name for your php script and a directory. Then get these in robots.txt to give the good bots time to see it and leave them alone. Try not to use the same as listed here. If you have 10,000 people using spider-trap. php then bots would know to stay away from that file. Be creative in naming.
I add this to all my pages:
Code:
<a href="/terminate.php" onclick="return false" rel="nofollow">
<img src="/images/tiny.gif" width="1" height="1" border="0" alt="" /></a>
I put this near the bottom of my left column links on all my pages. You don't want to put it so close to a link that it accidentally gets clicked. It could probably be anywhere on the page, but near the top of the code seems to make sense to me. Just make a clear 1 x 1 .gif and link to it.
So you would first edit robots.txt by adding:
Code:
User-agent: *
Disallow: /terminate.php
You don't want to add your code to the pages until the good bots like Google have loaded your edited robots.txt which might take 2 or 3 days depending on your site.
Try it on one site until your are comfortable putting it on all. I'll post the rest of it in a day or so. I got most of this from another forum and really don't know who to give credit to.
__________________
|

November 17th, 2008, 07:25 PM
|
|
ABW Ambassador
Join Date: January 18th, 2005
Location: Slidell, LA
Posts: 645
|
|
Not to get off the thread but we use
http://www.projecthoneypot.org/index.php
anyone have any experience with them?
__________________
Danny W Bonin Jr
Bonin Group, Inc.
|

November 20th, 2008, 07:40 AM
|
|
Comfortably Numb
Join Date: October 17th, 2005
Location: Bayou Country, LA
Posts: 3,431
|
|
After looking through my old notes I can now give credit and a link to where I got the Spider Trap. I was going to post the code but he has it there and a better explanation. Like I said we have been using it for around 4 years on all sites and it works. I get several of this emails every day.
The fun part is getting an email as one of the bad bots is blocked. I added this to the bottom of the terminate. php script or whatever you name it.
$bad_bot_ip=StripSlashes($bad_bot_ip);
$to="your_email@yourdomain.com";
$email="bugalert@bugoff.com"; // Make up what you like to see coming in to your inbox
$message="IP number $bad_bot_ip just got blocked from yoursitename.com.";
mail($to,"Spider Blocked",$message,"From: $email\n");
As far as testing goes I don't follow Birdman. I find it easier to click on the 1 x 1 image link or enter the path to the trap file in your browser. If it works you will see "Goodbye" printed and then you are blocked from your site. Just go to your .htaccess file and delete your IP number from the top and you are back in.
If you run anything like Xenu over your site it will block itself and you so disable your trap first. In the years of using this I have never had an IP blocked in error, but if you happen to be unlucky don't blame me (disclaimer).
__________________
|

November 20th, 2008, 07:56 AM
|
|
ABW Ambassador
Join Date: January 18th, 2005
Location: Slidell, LA
Posts: 645
|
|
John,
This might be a stupid question on my part but how does it determine what is a "good" bot and what is a "bad" bot?
__________________
Danny W Bonin Jr
Bonin Group, Inc.
|

November 20th, 2008, 08:47 AM
|
|
ABW Ambassador
Join Date: January 18th, 2005
Location: Mansfield, TX
Posts: 15,740
|
|
A good bot obeys robots.txt. A bad bot doesn't.
|

November 20th, 2008, 09:33 AM
|
|
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
|
|
Thank you John, this looks like it can be most helpful.
|

November 20th, 2008, 09:44 AM
|
|
Comfortably Numb
Join Date: October 17th, 2005
Location: Bayou Country, LA
Posts: 3,431
|
|
Quote:
|
A good bot obeys robots.txt. A bad bot doesn't.
|
Exactly! That's why it's important to list the trap file in robots.txt a few days before you set it all live.
I usually keep an eye on the .htaccess file so that the number of IPs blocked doesn't get close to 100. If too many I throw the old ones out as after a few months they don't seem to be coming back for their 403s.
__________________
|

November 22nd, 2008, 08:03 AM
|
|
ABW Ambassador
Join Date: January 18th, 2005
Location: Slidell, LA
Posts: 645
|
|
I set this up on Thursday and am looking forward to putting it in effect soon. Since Wednesday 2 of my sites have been visited by bots that click on everything and they show up after I call it a day so by the time I discover them it's to late.
__________________
Danny W Bonin Jr
Bonin Group, Inc.
|

November 22nd, 2008, 11:26 AM
|
|
Comfortably Numb
Join Date: October 17th, 2005
Location: Bayou Country, LA
Posts: 3,431
|
|
Quote:
|
Originally Posted by boningroup
I set this up on Thursday and am looking forward to putting it in effect soon.
|
I find that some seem to avoid the trap. Sometimes I'll manually add those to the top of the list in .htaccess.
I just got a notice that one was added to the top of this list:
Code:
SetEnvIf Remote_Addr ^72\.29\.233\.188$ getout
SetEnvIf Remote_Addr ^75\.126\.198\.130$ getout
SetEnvIf Remote_Addr ^213\.228\.185\.13$ getout
SetEnvIf Remote_Addr ^38\.100\.41\.102$ getout
SetEnvIf Remote_Addr ^65\.242\.250\.130$ getout
SetEnvIf Remote_Addr ^72\.51\.38\.94$ getout
SetEnvIf Remote_Addr ^213\.206\.94\.205$ getout
SetEnvIf Remote_Addr ^64\.91\.253\.229$ getout
SetEnvIf Remote_Addr ^79\.120\.190\.24$ getout
SetEnvIf Remote_Addr ^81\.193\.178\.144$ getout
SetEnvIf Remote_Addr ^62\.163\.70\.194$ getout
SetEnvIf Remote_Addr ^61\.250\.95\.201$ getout
SetEnvIf Remote_Addr ^213\.37\.178\.181$ getout
SetEnvIf Remote_Addr ^83\.42\.206\.52$ getout
SetEnvIf Remote_Addr ^87\.101\.4\.42$ getout
SetEnvIf Remote_Addr ^77\.248\.44\.230$ getout
SetEnvIf Remote_Addr ^38\.105\.83\.12$ getout
SetEnvIf Remote_Addr ^83\.194\.186\.28$ getout
SetEnvIf Remote_Addr ^74\.208\.68\.31$ getout
SetEnvIf Request_URI "^(/403.*\.htm|/robots\.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
deny from 66.154.102
deny from 66.154.103
Deny from 196.201.64.0/19
deny from 213.136.96.0/19
allow from env=allowsome
</Files>
You can get creative and combine some IPs like at the bottom, but I can never remember how to do that and have to learn it every time. I just let them pile up most of the time.
__________________
|

January 2nd, 2009, 03:22 PM
|
|
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
|
|
BUMP
I finally have gotten all of my sites set up and I have to say that I am now seeing sales where there used to be only clicks. I've tried other blocking methods that were not as successful as this one. Love getting those notifications when they're blocked too! If you have been plagued with finding borrowed content all over the place, this can only help as it would almost need to be manually done and not quite so world-wide as it has been. I knew it would be good if I could block these bots, but .htaccess scripts didn't seem to do it as well. Thank you so much for sharing, John!
|

January 2nd, 2009, 09:11 PM
|
|
ABW Ambassador
Join Date: February 23rd, 2007
Location: Shreveport, LA
Posts: 1,112
|
|
Thanks for the bump, and thanks for the great info John. This script will definitely come in handy for me, as I have been going through raw data the last 2 days trying to find a bad bot that is going places it shouldn't be going.
|

January 2nd, 2009, 09:58 PM
|
|
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
|
|
I should mention that I had to ask John for help. Seems I forgot to create the directory called for in the script. It's in the instructions but somehow I skipped over it. I went back to check step by step and noticed it. I guess because the steps are spread out over time and I was trying to implement it on all the sites at the same time. As soon as I backed off and did one site right, the rest were simple. Too many bots change their IP so this way they get blocked every time, no matter what name or IP address they use because it's their behavior that locks them up.
|

January 30th, 2009, 09:42 AM
|
|
ABW Ambassador
Join Date: October 14th, 2007
Location: MA
Posts: 1,888
|
|
Let me see if I can follow the directions. I am planning to put this into place but want to make sure I got it first since I am trying to piece together instructions of two sites.. I want to double check to see if I am following correctly. So....
Step 1) PUT IN ROBOTS.TXT
User-agent: *
Disallow: /getout. php
Step 2) CREATE FOLDER IN ROOT
Create folder /trap/
Step 3) CHANGE PERMISSION
Change chmod for .htaccess to 644 -rw-r--r--
change chmod for getout. php to 755 -rwxr-xr-x
Step 4) CHANGE .HTACCESS
Add these lines to top of .htaccess
Quote:
SetEnvIf Request_URI "^(/403.*\.htm|/robots\.txt)$" allowsome
<Files *>
order deny,allow
deny from env=getout
allow from env=allowsome
</Files>
|
Step 5) WAIT FOR A FEW DAYS FIRST, THEN ADD INVISIBLE GIF TO WEBPAGES AT BOTTOM OF THE PAGES
Quote:
|
<a href="/getout.php" onclick="return false" rel="nofollow"><img src="/images/clear.gif" width="1" height="1" border="0" alt="" /></a>
|
6) ADD GETOUT. PHP
Quote:
<?php
$lock_dir = $_SERVER["DOCUMENT_ROOT"] . "/trap/lock";
$filename = $_SERVER["DOCUMENT_ROOT"] . "/.htaccess";
$bad_bot_ip = str_replace(".", "\.", $_SERVER["REMOTE_ADDR"]);
$content = "SetEnvIf Remote_Addr ^" . $bad_bot_ip . "$ getout\r\n";
function make_lock_dir(){
global $lock_dir;
$key = @mkdir($lock_dir, 0777);
$i = 0;
while ($key === FALSE && $i++ < 20) {
clearstatcache();
usleep(rand(5,85));
$key = @mkdir($lock_dir, 0777);
return $key;
}
}
function write_ban(){
global $filename, $bad_bot_ip, $content, $lock_dir;
$handle = fopen($filename, 'r');
$content .= fread($handle,filesize($filename));
fclose($handle);
$handle = fopen($filename, 'w+');
fwrite($handle, $content,strlen($content));
fclose($handle);
rmdir($lock_dir);
print "Goodbye!";
}
function stale_check(){
global $lock_dir;
if (fileatime($lock_dir) < time()-120){
rmdir($lock_dir);
if (make_lock_dir()!== False) write_ban();
} else {
exit;
}
}
if (make_lock_dir()!== False) {
write_ban();
} else {
stale_check();
}
?>
$bad_bot_ip=StripSlashes($bad_bot_ip);
$to="your_email@yourdomain.com";
$email="bugalert@bugoff.com"; // Make up what you like to see coming in to your inbox
$message="IP number $bad_bot_ip just got blocked from yoursitename.com.";
mail($to,"Spider Blocked",$message,"From: $email\n");
|
================
My Questions:
* How does that look? Do step 1-4 first and then wait a few days?
* I am not quite sure what goes into the trap folder. Is the the script just sending the bad bots there? Are there any files in there?
* Do I add getout. php to the root, like www.example.com/getout.php ?
thanks! I am hoping to join you all and get some of these pesky bots!
|

January 30th, 2009, 10:06 AM
|
|
ABW Ambassador
Join Date: January 17th, 2005
Location: Tropical Mountaintop
Posts: 5,407
|
|
The robots.txt changes go up first to be sure that the good bots don't get trapped, then you wait, then put up all the rest and do the chmods. Be sure to edit your email so you get a notification  Nothing goes in the trap directory. Everything else is just right.
To test it go to yourdomain.com/getout. php and see if you get locked out. If you don't it's not right. If you do, reupload your .htaccess file and you're done. You will be happy you put it up. Thanks again, John!
|

January 30th, 2009, 10:13 AM
|
|
ABW Ambassador
Join Date: October 14th, 2007
Location: MA
Posts: 1,888
|
|
thanks 2busy and thank you John!
I was looking and searching for this thread yesterday looking under "bumpaw"... none such member to be found! glad I finally found it though!
Well, can't wait for my first email!
|

January 30th, 2009, 10:21 AM
|
|
Comfortably Numb
Join Date: October 17th, 2005
Location: Bayou Country, LA
Posts: 3,431
|
|
Quote:
|
* How does that look? Do step 1-4 first and then wait a few days?
|
Until you think the good search engines have read your robots file.
Quote:
|
* I am not quite sure what goes into the trap folder. Is the the script just sending the bad bots there? Are there any files in there?
|
Just leave it empty. The script uses it. I went through and followed what it's for but don't remember. I put it in root. You could do otherwise if you change paths.
I gave my script a creative file name so that it wouldn't be the same as all those following the original WMW thread as a paranoid feature in case the bots programed for the file name.
Be sure and feel free to PM me if you need help.
|

January 30th, 2009, 10:44 AM
|
|
ABW Ambassador
Join Date: February 23rd, 2007
Location: Shreveport, LA
Posts: 1,112
|
|
The only thing I changed that is different would be #5. I added a style="display:none" so human visitors can't see it if there mouse happens to hover over it. I know the chance of that happening is slim to none, but I like to make sure it's none.
Here is what my code looks like:
<a href="/getlost. php" onclick="return false" style="display:none" rel="nofollow">
<img src="/images/tiny.gif" width="1" height="1" border="0" alt="" /></a>
I have to say also, since I've setup this little script on my pages, I have captured about 20 bad bots with it. Thanks again for posting this John, it has saved me alot of time trying to find bad bots and worrying if other ones are roaming around on my site. Now I just sit back and wait for the emails
|

January 30th, 2009, 11:24 AM
|
|
Comfortably Numb
Join Date: October 17th, 2005
Location: Bayou Country, LA
Posts: 3,431
|
|
Quote:
|
Originally Posted by Lanadili
The only thing I changed that is different would be #5. I added a style="display:none" so human visitors can't see it if there mouse happens to hover over it.
|
That looks like a good idea. Glad to see it's working for you.
I put my trap link up kind of high in the code as apposed to the bottom just in case some are like Google giving priority to the items higher up. No idea if that's needed or not though.
|

January 30th, 2009, 11:42 AM
|
|
Visual Artist & ABW Ambassador
Join Date: September 7th, 2007
Location: Cuautitlán, Edo. de México
Posts: 1,725
|
|
Thanks for the tip Lan!
|

January 30th, 2009, 11:55 AM
|
|
ABW Ambassador
Join Date: October 14th, 2007
Location: MA
Posts: 1,888
|
|
Quote:
|
Originally Posted by Lanadili
Here is what my code looks like:
<a href="/getlost. php" onclick="return false" style="display:none" rel="nofollow">
<img src="/images/tiny.gif" width="1" height="1" border="0" alt="" /></a>
|
That is a good idea, I will add that in there! Thanks!
|
 |
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|