Page 1 of 3 123 LastLast
Results 1 to 25 of 51
  1. #1
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    Spider Trap Anyone?
    I have been using a nice little php Spider Trap that I would be willing to share here if anyone does not have one and feels the need. It gives me great joy when it emails me that an IP has been blocked and the website it was blocked from.

    It works when the bad bots follow a link on a 1px x 1px image that triggers the script. The script adds the bad IP to .htaccess and sends you the happy news.

    You block your script in robots.txt a few days prior to setting it up so that bots that honor robots.txt are not blocked. It's really very simple to add to your pages.


  2. #2
    Member
    Join Date
    October 11th, 2008
    Posts
    69
    would love to see that code John...

  3. #3
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    I would LOVE to see that... I spend more than an hour each week, reviewing logs & banning nasty little buggers that slow down my sites!

  4. #4
    Believe knight01's Avatar
    Join Date
    August 14th, 2006
    Location
    Dayton, Ohio
    Posts
    1,815
    Doesn't a trap put the bot into a loop on your site which would take up resources?
    Someday starts today
    Military Discounts

  5. #5
    ABW Ambassador writerguy's Avatar
    Join Date
    January 17th, 2005
    Location
    Springfield, Missouri, USA
    Posts
    3,248
    Sure, John, I'd be interested in getting it.

    Not exactly sure how to use it, but it sounds interesting. Long as it wouldn't hit my resources like knight01 mentioned.

    You've got my email address if you wanted to send it attached to a sort of "how-to" email.
    Generate more fake news.

  6. #6
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    Quote Originally Posted by knight01
    Doesn't a trap put the bot into a loop on your site which would take up resources?
    I haven't seen this and have been using this trap on all my sites for way more than a year.

    I'll try to explain it here. The first thing you will need to do is decide on a file name for your php script and a directory. Then get these in robots.txt to give the good bots time to see it and leave them alone. Try not to use the same as listed here. If you have 10,000 people using spider-trap.php then bots would know to stay away from that file. Be creative in naming.

    I add this to all my pages:
    Code:
    <a href="/terminate.php" onclick="return false" rel="nofollow">
      <img src="/images/tiny.gif" width="1" height="1" border="0" alt="" /></a>
    I put this near the bottom of my left column links on all my pages. You don't want to put it so close to a link that it accidentally gets clicked. It could probably be anywhere on the page, but near the top of the code seems to make sense to me. Just make a clear 1 x 1 .gif and link to it.

    So you would first edit robots.txt by adding:
    Code:
    User-agent: *
    Disallow: /terminate.php
    You don't want to add your code to the pages until the good bots like Google have loaded your edited robots.txt which might take 2 or 3 days depending on your site.

    Try it on one site until your are comfortable putting it on all. I'll post the rest of it in a day or so. I got most of this from another forum and really don't know who to give credit to.


  7. #7
    ABW Ambassador boningroup's Avatar
    Join Date
    January 18th, 2005
    Location
    Slidell, LA
    Posts
    667
    Not to get off the thread but we use

    http://www.projecthoneypot.org/index.php

    anyone have any experience with them?
    Danny W Bonin Jr
    Bonin Group, Inc.

  8. #8
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    After looking through my old notes I can now give credit and a link to where I got the Spider Trap. I was going to post the code but he has it there and a better explanation. Like I said we have been using it for around 4 years on all sites and it works. I get several of this emails every day.

    The fun part is getting an email as one of the bad bots is blocked. I added this to the bottom of the terminate.php script or whatever you name it.
    PHP Code:
    $bad_bot_ip=StripSlashes($bad_bot_ip);
    $to="your_email@yourdomain.com";
    $email="bugalert@bugoff.com"// Make up what you like to see coming in to your inbox
    $message="IP number $bad_bot_ip just got blocked from yoursitename.com.";
    mail($to,"Spider Blocked",$message,"From: $email\n"); 
    As far as testing goes I don't follow Birdman. I find it easier to click on the 1 x 1 image link or enter the path to the trap file in your browser. If it works you will see "Goodbye" printed and then you are blocked from your site. Just go to your .htaccess file and delete your IP number from the top and you are back in.

    If you run anything like Xenu over your site it will block itself and you so disable your trap first. In the years of using this I have never had an IP blocked in error, but if you happen to be unlucky don't blame me (disclaimer).


  9. #9
    ABW Ambassador boningroup's Avatar
    Join Date
    January 18th, 2005
    Location
    Slidell, LA
    Posts
    667
    John,

    This might be a stupid question on my part but how does it determine what is a "good" bot and what is a "bad" bot?
    Danny W Bonin Jr
    Bonin Group, Inc.

  10. #10
    Moderator MichaelColey's Avatar
    Join Date
    January 18th, 2005
    Location
    Mansfield, TX
    Posts
    16,232
    A good bot obeys robots.txt. A bad bot doesn't.
    Michael Coley
    Amazing-Bargains.com
     Affiliate Tips | Merchant Best Practices | Affiliate Friendly? | Couponing | CPA Networks? | ABW Tips | Activating Affiliates
    "Education is the most powerful weapon which you can use to change the world." Nelson Mandela

  11. #11
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Thank you John, this looks like it can be most helpful.

  12. #12
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    A good bot obeys robots.txt. A bad bot doesn't.
    Exactly! That's why it's important to list the trap file in robots.txt a few days before you set it all live.

    I usually keep an eye on the .htaccess file so that the number of IPs blocked doesn't get close to 100. If too many I throw the old ones out as after a few months they don't seem to be coming back for their 403s.


  13. #13
    ABW Ambassador boningroup's Avatar
    Join Date
    January 18th, 2005
    Location
    Slidell, LA
    Posts
    667
    I set this up on Thursday and am looking forward to putting it in effect soon. Since Wednesday 2 of my sites have been visited by bots that click on everything and they show up after I call it a day so by the time I discover them it's to late.
    Danny W Bonin Jr
    Bonin Group, Inc.

  14. #14
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    Quote Originally Posted by boningroup
    I set this up on Thursday and am looking forward to putting it in effect soon.
    I find that some seem to avoid the trap. Sometimes I'll manually add those to the top of the list in .htaccess.

    I just got a notice that one was added to the top of this list:
    Code:
    SetEnvIf Remote_Addr ^72\.29\.233\.188$ getout
    SetEnvIf Remote_Addr ^75\.126\.198\.130$ getout
    SetEnvIf Remote_Addr ^213\.228\.185\.13$ getout
    SetEnvIf Remote_Addr ^38\.100\.41\.102$ getout
    SetEnvIf Remote_Addr ^65\.242\.250\.130$ getout
    SetEnvIf Remote_Addr ^72\.51\.38\.94$ getout
    SetEnvIf Remote_Addr ^213\.206\.94\.205$ getout
    SetEnvIf Remote_Addr ^64\.91\.253\.229$ getout
    SetEnvIf Remote_Addr ^79\.120\.190\.24$ getout
    SetEnvIf Remote_Addr ^81\.193\.178\.144$ getout
    SetEnvIf Remote_Addr ^62\.163\.70\.194$ getout
    SetEnvIf Remote_Addr ^61\.250\.95\.201$ getout
    SetEnvIf Remote_Addr ^213\.37\.178\.181$ getout
    SetEnvIf Remote_Addr ^83\.42\.206\.52$ getout
    SetEnvIf Remote_Addr ^87\.101\.4\.42$ getout
    SetEnvIf Remote_Addr ^77\.248\.44\.230$ getout
    SetEnvIf Remote_Addr ^38\.105\.83\.12$ getout
    SetEnvIf Remote_Addr ^83\.194\.186\.28$ getout
    SetEnvIf Remote_Addr ^74\.208\.68\.31$ getout
    SetEnvIf Request_URI "^(/403.*\.htm|/robots\.txt)$" allowsome
    <Files *>
    order deny,allow
    deny from env=getout
    deny from 66.154.102
    deny from 66.154.103
    Deny from 196.201.64.0/19
    deny from 213.136.96.0/19
    allow from env=allowsome
    </Files>
    You can get creative and combine some IPs like at the bottom, but I can never remember how to do that and have to learn it every time. I just let them pile up most of the time.


  15. #15
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    BUMP
    I finally have gotten all of my sites set up and I have to say that I am now seeing sales where there used to be only clicks. I've tried other blocking methods that were not as successful as this one. Love getting those notifications when they're blocked too! If you have been plagued with finding borrowed content all over the place, this can only help as it would almost need to be manually done and not quite so world-wide as it has been. I knew it would be good if I could block these bots, but .htaccess scripts didn't seem to do it as well. Thank you so much for sharing, John!

  16. #16
    ABW Ambassador Lanadili's Avatar
    Join Date
    February 23rd, 2007
    Location
    Shreveport, LA
    Posts
    1,114
    Thanks for the bump, and thanks for the great info John. This script will definitely come in handy for me, as I have been going through raw data the last 2 days trying to find a bad bot that is going places it shouldn't be going.

  17. #17
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    I should mention that I had to ask John for help. Seems I forgot to create the directory called for in the script. It's in the instructions but somehow I skipped over it. I went back to check step by step and noticed it. I guess because the steps are spread out over time and I was trying to implement it on all the sites at the same time. As soon as I backed off and did one site right, the rest were simple. Too many bots change their IP so this way they get blocked every time, no matter what name or IP address they use because it's their behavior that locks them up.

  18. #18
    ABW Ambassador ladidah's Avatar
    Join Date
    October 15th, 2007
    Location
    MA
    Posts
    1,888
    Let me see if I can follow the directions. I am planning to put this into place but want to make sure I got it first since I am trying to piece together instructions of two sites.. I want to double check to see if I am following correctly. So....


    Step 1) PUT IN ROBOTS.TXT

    User-agent: *
    Disallow: /getout.php

    Step 2) CREATE FOLDER IN ROOT

    Create folder /trap/

    Step 3) CHANGE PERMISSION

    Change chmod for .htaccess to 644 -rw-r--r--
    change chmod for getout.php to 755 -rwxr-xr-x

    Step 4) CHANGE .HTACCESS

    Add these lines to top of .htaccess
    SetEnvIf Request_URI "^(/403.*\.htm|/robots\.txt)$" allowsome
    <Files *>
    order deny,allow
    deny from env=getout
    allow from env=allowsome
    </Files>
    Step 5) WAIT FOR A FEW DAYS FIRST, THEN ADD INVISIBLE GIF TO WEBPAGES AT BOTTOM OF THE PAGES

    <a href="/getout.php" onclick="return false" rel="nofollow"><img src="/images/clear.gif" width="1" height="1" border="0" alt="" /></a>
    6) ADD GETOUT.PHP

    <?php
    $lock_dir = $_SERVER["DOCUMENT_ROOT"] . "/trap/lock";
    $filename = $_SERVER["DOCUMENT_ROOT"] . "/.htaccess";
    $bad_bot_ip = str_replace(".", "\.", $_SERVER["REMOTE_ADDR"]);
    $content = "SetEnvIf Remote_Addr ^" . $bad_bot_ip . "$ getout\r\n";
    function make_lock_dir(){
    global $lock_dir;
    $key = @mkdir($lock_dir, 0777);
    $i = 0;
    while ($key === FALSE && $i++ < 20) {
    clearstatcache();
    usleep(rand(5,85));
    $key = @mkdir($lock_dir, 0777);
    return $key;
    }
    }
    function write_ban(){
    global $filename, $bad_bot_ip, $content, $lock_dir;
    $handle = fopen($filename, 'r');
    $content .= fread($handle,filesize($filename));
    fclose($handle);
    $handle = fopen($filename, 'w+');
    fwrite($handle, $content,strlen($content));
    fclose($handle);
    rmdir($lock_dir);
    print "Goodbye!";
    }

    function stale_check(){
    global $lock_dir;
    if (fileatime($lock_dir) < time()-120){
    rmdir($lock_dir);
    if (make_lock_dir()!== False) write_ban();
    } else {
    exit;
    }
    }

    if (make_lock_dir()!== False) {
    write_ban();
    } else {
    stale_check();
    }

    ?>


    $bad_bot_ip=StripSlashes($bad_bot_ip);
    $to="your_email@yourdomain.com";
    $email="bugalert@bugoff.com"; // Make up what you like to see coming in to your inbox
    $message="IP number $bad_bot_ip just got blocked from yoursitename.com.";
    mail($to,"Spider Blocked",$message,"From: $email\n");
    ================

    My Questions:
    * How does that look? Do step 1-4 first and then wait a few days?
    * I am not quite sure what goes into the trap folder. Is the the script just sending the bad bots there? Are there any files in there?
    * Do I add getout.php to the root, like www.example.com/getout.php ?

    thanks! I am hoping to join you all and get some of these pesky bots!

  19. #19
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    The robots.txt changes go up first to be sure that the good bots don't get trapped, then you wait, then put up all the rest and do the chmods. Be sure to edit your email so you get a notification Nothing goes in the trap directory. Everything else is just right.
    To test it go to yourdomain.com/getout.php and see if you get locked out. If you don't it's not right. If you do, reupload your .htaccess file and you're done. You will be happy you put it up. Thanks again, John!

  20. #20
    ABW Ambassador ladidah's Avatar
    Join Date
    October 15th, 2007
    Location
    MA
    Posts
    1,888
    thanks 2busy and thank you John!

    I was looking and searching for this thread yesterday looking under "bumpaw"... none such member to be found! glad I finally found it though!

    Well, can't wait for my first email!

  21. #21
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    * How does that look? Do step 1-4 first and then wait a few days?
    Until you think the good search engines have read your robots file.
    * I am not quite sure what goes into the trap folder. Is the the script just sending the bad bots there? Are there any files in there?
    Just leave it empty. The script uses it. I went through and followed what it's for but don't remember.
    * Do I add getout.php to the root, like www.example.com/getout.php ?
    I put it in root. You could do otherwise if you change paths.

    I gave my script a creative file name so that it wouldn't be the same as all those following the original WMW thread as a paranoid feature in case the bots programed for the file name.

    Be sure and feel free to PM me if you need help.

  22. #22
    ABW Ambassador Lanadili's Avatar
    Join Date
    February 23rd, 2007
    Location
    Shreveport, LA
    Posts
    1,114
    The only thing I changed that is different would be #5. I added a style="display:none" so human visitors can't see it if there mouse happens to hover over it. I know the chance of that happening is slim to none, but I like to make sure it's none.

    Here is what my code looks like:

    <a href="/getlost.php" onclick="return false" style="display:none" rel="nofollow">
    <img src="/images/tiny.gif" width="1" height="1" border="0" alt="" /></a>

    I have to say also, since I've setup this little script on my pages, I have captured about 20 bad bots with it. Thanks again for posting this John, it has saved me alot of time trying to find bad bots and worrying if other ones are roaming around on my site. Now I just sit back and wait for the emails

  23. #23
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    Quote Originally Posted by Lanadili
    The only thing I changed that is different would be #5. I added a style="display:none" so human visitors can't see it if there mouse happens to hover over it.
    That looks like a good idea. Glad to see it's working for you.

    I put my trap link up kind of high in the code as apposed to the bottom just in case some are like Google giving priority to the items higher up. No idea if that's needed or not though.

  24. #24
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitlán, Edo. de México
    Posts
    1,725
    Thanks for the tip Lan!

  25. #25
    ABW Ambassador ladidah's Avatar
    Join Date
    October 15th, 2007
    Location
    MA
    Posts
    1,888
    Quote Originally Posted by Lanadili
    Here is what my code looks like:

    <a href="/getlost.php" onclick="return false" style="display:none" rel="nofollow">
    <img src="/images/tiny.gif" width="1" height="1" border="0" alt="" /></a>
    That is a good idea, I will add that in there! Thanks!

+ Reply to Thread
Page 1 of 3 123 LastLast

Similar Threads

  1. Avoiding the SPAM trap
    By jhardy in forum Spam
    Replies: 14
    Last Post: October 25th, 2004, 08:31 PM
  2. good spider or bad spider?
    By Gordon in forum Newbie Affiliate FAQs & Helpful Articles
    Replies: 3
    Last Post: September 3rd, 2004, 09:38 AM
  3. How to Set a Mouse Trap
    By Jane in forum Virtual Family and Off-Topic
    Replies: 24
    Last Post: August 14th, 2003, 09:55 AM
  4. Looking for a spam trap?
    By ecomcity in forum Spam
    Replies: 5
    Last Post: September 12th, 2002, 07:04 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •