Results 1 to 12 of 12
  1. #1
    Tax Paying Member
    Join Date
    November 14th, 2005
    Location
    Chapel Hill, NC
    Posts
    2,119
    Blocking Bots in .htaccess
    I am not a programmer. Was looking thru some things tonight and found this in the .htaccess file on one of my sites. It jumped out to me since yandex was mentioned. This may sound dumb.......but is it a good thing or a bad thing...?

    RewriteEngine on
    # Options +FollowSymlinks
    RewriteCond %{HTTP_REFERER} yandex\.ru [NC]
    RewriteRule .* - [F]
    You must climb this mountain. There is no elevator. ---- Don't stick your finger in the liquid nitrogen.
    Carolina China

  2. #2
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    That is just to block access for the yandex spider. I use a little different treatment that uses

    #Options +FollowSymLinks
    Options +SymLinksIfOwnerMatch
    and
    RewriteRule . - [F,L]
    instead. It issues a 403 error document to the bot.

  3. #3
    Tax Paying Member
    Join Date
    November 14th, 2005
    Location
    Chapel Hill, NC
    Posts
    2,119
    Thanks..............

    One more newbie question........Should this code be between the begin and end of WP or does the position really matter?
    You must climb this mountain. There is no elevator. ---- Don't stick your finger in the liquid nitrogen.
    Carolina China

  4. #4
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Mine are not in the WP section of .htaccess
    One good place to learn more about .htaccess is at http://www.askapache.com/htaccess/mod_rewrite-tips-and-tricks.html and another is at http://perishablepress.com/press/2006/01/10/stupid-htaccess-tricks/ but I have run across some issues at that second site.

  5. #5
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    You can block as many bots as you want using the following script in your .htaccess file.

    <Limit GET POST>
    #The next line modified by DenyIP
    order allow,deny
    #The next line modified by DenyIP
    #deny from all
    allow from all
    </Limit>
    <Limit PUT DELETE>
    order deny,allow
    deny from all
    </Limit>

    <Files 403.shtml>
    order allow,deny
    allow from all
    </Files>

    SetEnvIfNoCase User-Agent .*Twiceler.* bad_bot
    SetEnvIfNoCase User-Agent .*Java.* bad_bot
    SetEnvIfNoCase User-Agent .*Sogou web spider.* bad_bot
    SetEnvIfNoCase User-Agent .*YandexBot.* bad_bot
    SetEnvIfNoCase User-Agent .*spbot.* bad_bot
    SetEnvIfNoCase User-Agent .*Baiduspider.* bad_bot
    SetEnvIfNoCase User-Agent .*libwww-perl.* bad_bot
    SetEnvIfNoCase User-Agent .*DotBot.* bad_bot
    SetEnvIfNoCase User-Agent .*MJ12bot.* bad_bot
    SetEnvIfNoCase User-Agent .*Jakarta Commons.* bad_bot
    SetEnvIfNoCase User-Agent .*Sosospider.* bad_bot
    SetEnvIfNoCase User-Agent .*bixolabs.* bad_bot
    SetEnvIfNoCase User-Agent .*ia_archiver.* bad_bot
    SetEnvIfNoCase User-Agent .*GeoHasher.* bad_bot
    SetEnvIfNoCase User-Agent .*Indy Library.* bad_bot
    SetEnvIfNoCase User-Agent .*Yeti.* bad_bot
    SetEnvIfNoCase User-Agent .*Mail.Ru.* bad_bot
    SetEnvIfNoCase User-Agent .*LMQueueBot.* bad_bot
    SetEnvIfNoCase User-Agent .*VoilaBot.* bad_bot
    SetEnvIfNoCase User-Agent .*ScrapeBox.* bad_bot
    SetEnvIfNoCase User-Agent .*Huaweisymantecspider.* bad_bot
    SetEnvIfNoCase User-Agent .*larbin.* bad_bot
    SetEnvIfNoCase User-Agent .*Nutch.* bad_bot



    order allow,deny
    deny from env=bad_bot
    allow from all


    As you find new bots just add another line to the file.

    These are the bots I'm blocking right now.

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW

  6. Thanks From:

  7. #6
    Tax Paying Member
    Join Date
    November 14th, 2005
    Location
    Chapel Hill, NC
    Posts
    2,119
    Thanks 2Busy......Looks like a gold mine of information.
    You must climb this mountain. There is no elevator. ---- Don't stick your finger in the liquid nitrogen.
    Carolina China

  8. #7
    Tax Paying Member
    Join Date
    November 14th, 2005
    Location
    Chapel Hill, NC
    Posts
    2,119
    Thanks BurgerBoy

    You gave me this in the past but there was some additional lines of code in my file that I did not remember putting there. I was trying to sort through to find stuff to put in the htaccess file of another site which is becomming a Yandex playground.

    I only used your previous info on one site at the time and was gona do the others later..............Oh well..........It is now later

    Again...Thanks
    You must climb this mountain. There is no elevator. ---- Don't stick your finger in the liquid nitrogen.
    Carolina China

  9. #8
    Moderator MichaelColey's Avatar
    Join Date
    January 18th, 2005
    Location
    Mansfield, TX
    Posts
    16,232
    Moderator Note: This turned into some great general advice, so I've renamed it and moved it to the Programmer's Corner (and flagged it as a featured thread).
    Michael Coley
    Amazing-Bargains.com
     Affiliate Tips | Merchant Best Practices | Affiliate Friendly? | Couponing | CPA Networks? | ABW Tips | Activating Affiliates
    "Education is the most powerful weapon which you can use to change the world." Nelson Mandela

  10. #9
    Full Member gcarson's Avatar
    Join Date
    November 13th, 2009
    Posts
    383
    I've used this script with good success. Spider Trap Anyone? - ABestWeb Affiliate Marketing Forum Seems to block the bad bots pretty well.

  11. #10
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    I use John's spider trap on all my sites and it works fine. I only wish I knew enough about it to edit out lines by using IP blocks where appropriate. Rather than having 300 lines of blocked bots as in:
    SetEnvIf Remote_Addr ^72\.44\.50\.81$ getout

    SetEnvIf Remote_Addr ^72\.44\.36\.69$ getout

    SetEnvIf Remote_Addr ^72\.44\.36\.184$ getout

    SetEnvIf Remote_Addr ^72\.44\.37\.210$ getout
    It would be nice to just have:

    SetEnvIf Remote_Addr ^72\.44\.*\.*$ getout
    But I don't know if that would work.

  12. #11
    Moderator MichaelColey's Avatar
    Join Date
    January 18th, 2005
    Location
    Mansfield, TX
    Posts
    16,232
    Drop the $ and it should work.
    Michael Coley
    Amazing-Bargains.com
     Affiliate Tips | Merchant Best Practices | Affiliate Friendly? | Couponing | CPA Networks? | ABW Tips | Activating Affiliates
    "Education is the most powerful weapon which you can use to change the world." Nelson Mandela

  13. #12
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Thank you Michael! I have been trimming it back but getting the same lines added in again and wished for a way to shorten the list. You can't always block a whole block but when you can, it is obvious. Thank you

  14. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Blocking bots from spidering AE.pl
    By beggers in forum Cusimano.com Scripts
    Replies: 7
    Last Post: September 19th, 2006, 12:17 PM
  2. Bad Bots Help
    By reaper in forum Spam
    Replies: 9
    Last Post: February 6th, 2006, 08:59 PM
  3. Current bad bots htaccess list?
    By Andy in forum Spam
    Replies: 6
    Last Post: August 17th, 2004, 08:31 AM
  4. Bots & Clicks
    By Javi in forum BettyMills
    Replies: 3
    Last Post: January 14th, 2004, 11:39 AM
  5. What is the google bots name?
    By Tami in forum Search Engine Optimization
    Replies: 3
    Last Post: April 4th, 2002, 05:19 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •