Results 1 to 14 of 14
  1. #1
    Tax Paying Member
    Join Date
    November 14th, 2005
    Location
    Chapel Hill, NC
    Posts
    2,119
    Question about Yandex
    I am seeing search terms by way of Yandex that I have never seen before.

    Search terms are showing as
    ????
    ????.???
    ?????.com
    Yes, I do have my robots text file set up, but Yandex keeps making slight changes in their URL.

    Is this a concern? Any comments?
    You must climb this mountain. There is no elevator. ---- Don't stick your finger in the liquid nitrogen.
    Carolina China

  2. #2
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Try using John Powell's BadSpider script so that they obey robots.txt or get blocked - and blocked every time they come in with a new IP. A search here turns it up and it does work.
    If they do obey but you want to block them, just use WHOIS to find the IP range and block them with .htaccess.

  3. #3
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    I block robots by name - not IP.

    The following code in your .htaccess file can be used to block any robot you want to block regardless of what they change their ip address to.

    Code:
    <Limit GET POST>
    #The next line modified by DenyIP
    order allow,deny
    #The next line modified by DenyIP
    #deny from all
    allow from all
    </Limit>
    <Limit PUT DELETE>
    order deny,allow
    deny from all
    </Limit>
    
    <Files 403.shtml>
    order allow,deny
    allow from all
    </Files>
    
    SetEnvIfNoCase User-Agent .*Twiceler.* bad_bot
    SetEnvIfNoCase User-Agent .*Java.* bad_bot
    SetEnvIfNoCase User-Agent .*Sogou web spider.* bad_bot
    SetEnvIfNoCase User-Agent .*YandexBot.* bad_bot
    SetEnvIfNoCase User-Agent .*spbot.* bad_bot
    
    
    order allow,deny
    deny from env=bad_bot
    allow from all
    As new robots show up just add a new line with their agent name.

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW

  4. Thanks From:

  5. #4
    Tax Paying Member
    Join Date
    November 14th, 2005
    Location
    Chapel Hill, NC
    Posts
    2,119
    Thanks for the help.
    You must climb this mountain. There is no elevator. ---- Don't stick your finger in the liquid nitrogen.
    Carolina China

  6. #5
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    You're welcome.

    I use that code on all of my sites and it stops the crawlers you don't want on your site dead in their tracks.
    Last edited by BurgerBoy; July 23rd, 2010 at 01:31 PM.

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW

  7. #6
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,744
    I know this is an old thread but it really has helped me out. I wanted to say a BIG THANK YOU to Burgerboy for the code he posted to stop bad bots. It worked like a charm and bad bots are now being served the proper 403's they deserve.

    Over the last couple of months, I've had a bot or bots inflating my click through count so I've been researching how to stop them. The code above has really made a difference.

    One bad bot that I would definitely add to the list is the AhrefsBot. Once it hits your site it seems to never leave. It's now eating 403s. So anyone using the code provided above by Burgerboy may want to add this one to the list.

    SetEnvIfNoCase User-Agent .*AhrefsBot.* bad_bot


  8. #7
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    This thread made me laugh.. I saw the title, and not realizing the original date, was all ready to write "you should block Yandex"...

    Then I realized this was the exact thread that prompted me to ban YandexBot..

    Thanks BurgerBoy (almost 3 years later!).. and msladybug, thanks for the info re: AhrefsBot.

  9. Thanks From:

  10. #8
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,744
    Thanks teezone for wanting to help me out.

    Just a word of advice to anyone looking to add this code or any code to their .htaccess file. ALWAYS have a backup copy of your original .htaccess file. I tried several types of code before using the one Burgerboy posted. What I found was finding the proper code depends on what type of server and set up you are using. I tried some code I found on another website and it took my whole website down. Thankfully, I had a copy of my original .htaccess file and quickly uploaded it to recover my website.

  11. Thanks From:

  12. #9
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    Here's the list of bots I'm banning now:

    SetEnvIfNoCase User-Agent .*Twiceler.* bad_bot
    SetEnvIfNoCase User-Agent .*Java.* bad_bot
    SetEnvIfNoCase User-Agent .*Sogou web spider.* bad_bot
    SetEnvIfNoCase User-Agent .*YandexBot.* bad_bot
    SetEnvIfNoCase User-Agent .*spbot.* bad_bot
    SetEnvIfNoCase User-Agent .*Baiduspider.* bad_bot
    SetEnvIfNoCase User-Agent .*libwww-perl.* bad_bot
    SetEnvIfNoCase User-Agent .*DotBot.* bad_bot
    SetEnvIfNoCase User-Agent .*Sogou-Test-Spider.* bad_bot
    SetEnvIfNoCase User-Agent .*Alcohol Search.* bad_bot
    SetEnvIfNoCase User-Agent .*ia_archiver.* bad_bot
    SetEnvIfNoCase User-Agent .*agbot.* bad_bot
    SetEnvIfNoCase User-Agent .*GeoHasher.* bad_bot
    SetEnvIfNoCase User-Agent .*TurnitinBot.* bad_bot
    SetEnvIfNoCase User-Agent .*JikeSpider.* bad_bot
    SetEnvIfNoCase User-Agent .*voilabot.* bad_bot
    SetEnvIfNoCase User-Agent .*Sosospider.* bad_bot
    SetEnvIfNoCase User-Agent .*Wayback.* bad_bot
    SetEnvIfNoCase User-Agent .*coccoc.* bad_bot
    SetEnvIfNoCase User-Agent .*proximic.* bad_bot
    SetEnvIfNoCase User-Agent .*Ezooms.* bad_bot
    SetEnvIfNoCase User-Agent .*YodaoBot.* bad_bot
    SetEnvIfNoCase User-Agent .*Exabot.* bad_bot
    SetEnvIfNoCase User-Agent .*Nutch.* bad_bot
    SetEnvIfNoCase User-Agent .*DigExt.* bad_bot
    SetEnvIfNoCase User-Agent .*AhrefsBot.* bad_bot
    SetEnvIfNoCase User-Agent .*WebMoney Advisor.* bad_bot
    SetEnvIfNoCase User-Agent .*mgmt.mic.* bad_bot
    SetEnvIfNoCase User-Agent .*SeznamBot.* bad_bot
    SetEnvIfNoCase User-Agent .*discoverybot.* bad_bot
    SetEnvIfNoCase User-Agent .*MJ12bot.* bad_bot

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW

  13. Thanks From:

  14. #10
    ABW Ambassador
    Join Date
    January 4th, 2006
    Location
    USA
    Posts
    2,477
    I didn't notice the original post date at first either.

    Thanks for bumping this thread, msladybug. Good threads like this need to bumped every once in a while.

    Big thank you to BurgerBoy for sharing the codes!

  15. Thanks From:

  16. #11
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    You missed one Richard -

    SetEnvIfNoCase User-Agent .*Googlebot.* bad_bot
    Salty kisses, Sandy toes, and a Pirate's heart...

  17. Thanks From:

  18. #12
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    Talking
    Quote Originally Posted by Convergence View Post
    You missed one Richard -

    SetEnvIfNoCase User-Agent .*Googlebot.* bad_bot
    That's true. To bad another SE can't take all the business from Grabble and wipe them out.

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW

  19. Thanks From:

  20. #13
    ABW Veteran Mr. Sal's Avatar
    Join Date
    January 18th, 2005
    Posts
    6,795
    .*Wayback.* bad_bot
    One question...

    What is soo bad about the Wayback machine?

  21. #14
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    It was on the site tying up the server all the time and I got tired of it so I banned it.

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW

  22. Thanks From:

  23. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Yandex SEO specifics
    By Sveta Meisak in forum Newbie Affiliate FAQs & Helpful Articles
    Replies: 5
    Last Post: December 12th, 2014, 01:00 AM
  2. Pixel Tracking Question.. (Tech Question)
    By CoolAffiliate in forum Midnight Cafe'
    Replies: 1
    Last Post: August 14th, 2006, 01:40 AM
  3. Replies: 3
    Last Post: April 29th, 2003, 06:31 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •