Results 1 to 17 of 17
  1. #1
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    BAD BOTS (not sure if this is the right forum)
    I have gathered a list HERE of IP's and user agents of bots, spiders and crawlers that do NOT follow the robots.txt. Feel free to use them on your sites if you want. I will be trying to update them about monthly as I catch more.

    Mods.... If this needs to be in a different forum please move it for me.
    One day parasites and their ilk will be made illegal, I bet a few Lawyers will be pissed off when the day comes.
    Mr. Spitzer is fetching it nearer

    YouTrek

  2. #2
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    Bad bot: ShopWiki.com
    My servers started getting bombarded with requests from a bot that identifies itself as associated with "ShopWiki" -- the bot checked my robots.txt file ONCE last week:

    2005-06-23 10:29:33 W3SVC2136123179 GET /robots.txt - 80 - 64.124.165.25 ShopWiki/1.0+(++http://www.shopwiki.com/)

    (it had earlier sent requests to my server without checking robots.txt, and it has not re-checked that file since Thursday morning, even though it continues to return to bombard my server with requests).

    When it visits, it requests 12-15 pages per second, until my server is totally swamped and stops generating timely responses. (Note that 12-15 requests per second is not inherently an overload level, but when sustained and intermixed with other requests, including requests that trigger database queries, it is nearly guaranteed to crash most servers.)

    Then it goes away for a while, only to return after an hour or so, without re-checking robots.txt -- and again it sends 12-15 page requests every second until my server chokes.

    Yesterday, I added the bot's IP address ( 64.124.165.25 ) to my server block list, and of course the attaks ended.

    Today, I received an email from the site administrator, who claimed that the ShopWiki bot honors the robots.txt file directives -- so I added a line to limit that bot, and removed the IP block. Within a few hours, the bot was again assaulting my server -- without EVER checking the robots.txt file again.

    I re-installed the IP block just a few moments ago, in order to restore access to my server to the rest of the web, and I urge other webmasters to do the same. The shopwiki folks don't offer any human contact info, and the admin emails are answered anonymously; the WHOIS info is private, too.

  3. #3
    Member Chocolate_Chicken's Avatar
    Join Date
    January 19th, 2005
    Location
    The Hen House
    Posts
    1,227
    It might not be a bad idea to consider listing other bots which, though they may obey the robots.txt, should be blocked anyway as they serve no other purpose than to waste bandwidth.

  4. #4
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    I've updated my list of bad bots (link in post #1). If anyone notices a bot that I should not have banned please let me know. I have removed few from the last list that I think are not bad ones.

    I've sorted them by date caught this this time, I think this should make it easier for you to find any new ones you dont have and you can add their IP's to your list.

    Hope this helps some people.

    [edited to add] I am going to add some email scrappers IP's to the list soon. I will repost when I have done it.
    One day parasites and their ilk will be made illegal, I bet a few Lawyers will be pissed off when the day comes.
    Mr. Spitzer is fetching it nearer

    YouTrek

  5. #5
    Crazy like a fox suzigeek's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,096
    Thanks Gordon. Just the task I was getting to today...I might have a few more to add I have to cross check to see if they are already listed.
    Suz~~GearGirl~~

  6. #6
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    744
    Can you explain how to create a server block list?

  7. #7
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    I caught this assole trying to crawl 9 different sites today using 9 different IP's from his/her block

    When I update the file it will have them all in.

    % Information related to '193.254.184.0 - 193.254.191.255'

    inetnum: 193.254.184.0 - 193.254.191.255
    netname: CRONON-NET
    descr: -----------------------------------
    descr: Cronon AG, Niederlassung Regensburg
    descr: Serverhousing, Domainregistrierung,
    descr: Internet-Services fuer
    descr: Gewerbetreibende
    descr: WWW: http://www.cronon-ag.de/
    descr: Bei Missbrauch, bitte Mail an
    descr: *****@cronon-ag.de
    descr: -----------------------------------
    country: DE
    admin-c: FWH1
    tech-c: FWH1
    status: ASSIGNED PI
    mnt-by: RIPE-NCC-HM-PI-MNT
    mnt-lower: RIPE-NCC-HM-PI-MNT
    mnt-by: ABCAG-MNT
    mnt-by: CRONON-MNT
    mnt-routes: ABCAG-MNT
    changed: **********@ripe.net 20021217
    changed: **********@ripe.net 20021217
    changed: *****@cronon-ag.de 20030320
    source: RIPE

    person: Florian Wilhelm Heinz
    address: Grasgasse 1
    address: 93047 Regensburg
    address: Germany
    +49 175 179 25 45
    e-mail: *****@cronon-ag.de
    nic-hdl: FWH1
    notify: *****@cronon-ag.de
    mnt-by: ABCAG-MNT
    changed: *****@cronon-ag.de 20040130
    source: RIPE

    % Information related to 'FWH1'

    route: 193.254.184.0/21
    descr: CRONON-NET
    origin: AS25504
    notify: *****@cronon-ag.de
    mnt-by: ABCAG-MNT
    changed: *****@cronon-ag.de 20021217
    source: RIPE
    One day parasites and their ilk will be made illegal, I bet a few Lawyers will be pissed off when the day comes.
    Mr. Spitzer is fetching it nearer

    YouTrek

  8. #8
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    I've just updated the list but I've not had time to add the mail scrappers yet.

    Here is the list

    I have a question for some knowledgable person ......
    Does anybody know why some of the UserAgents names have all of a sudden just stopped themselves from being captured? (see missing names in list)

    I am still capturing their User IP's but for some reason just this month some of the UserAgents names are not being captured.
    One day parasites and their ilk will be made illegal, I bet a few Lawyers will be pissed off when the day comes.
    Mr. Spitzer is fetching it nearer

    YouTrek

  9. #9
    Member
    Join Date
    January 18th, 2005
    Posts
    65
    bot tool
    Try http://www.botbuster.com useful script more for adult market but might help.
    Last edited by Hawaii5; July 28th, 2005 at 02:54 AM. Reason: grammar

  10. #10
    Newbie
    Join Date
    April 10th, 2005
    Posts
    28
    Gordon
    I'm not sure that I understand all of this, but a good number of the bots (IPs) that you show in your list appear in my logs. The most frequent is .NET CLR (with several IPs) followed by Voila.Bot and Fun Web Products. Are all of these bad? Do I need to add every IP that they show to my blocked IP list?

  11. #11
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    justsue

    I don't know if they are all bad bots, all I know is they had to follow a link that they should not follow and they also had to enter a directory they should not enter (if they follow the robots.txt file) so this gets them banned from my sites.

    The link they have to follow cannot be reached by the average surfer so as far as I am concerned they are bad for me.

    I do keep a log of the usernames that I look at regularly so if I happen to see a good bot (I have in the past forgotten to add the directory to the robots.txt on a site then a good bot will get banned) I can just remove it from the database.

    I'm sure the Voila.Bot is a French search engine I guess I should remove that one from my list. This is why I asked if anyone knows of any good bots in the list.
    One day parasites and their ilk will be made illegal, I bet a few Lawyers will be pissed off when the day comes.
    Mr. Spitzer is fetching it nearer

    YouTrek

  12. #12
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    I've just updated the bad bots list if anybody is interested. list here
    One day parasites and their ilk will be made illegal, I bet a few Lawyers will be pissed off when the day comes.
    Mr. Spitzer is fetching it nearer

    YouTrek

  13. #13
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    I've updated the list again if anyone wants to view it.
    One day parasites and their ilk will be made illegal, I bet a few Lawyers will be pissed off when the day comes.
    Mr. Spitzer is fetching it nearer

    YouTrek

  14. #14
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    Bad bot at 81.202.156.214
    In my logs tonight, I found exactly 501 sequential page requests from a new robot at IP address 81.202.156.214 (which is apparently in Spain), coming at a pace of 8-10 requests per second. It's clearly a 'bot' of some kind -- my hunch is that it is a test bot (since it seems to cap at 500 requests) but it does not identify itself. User-Agent is "Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+98)" and referrer field is blank.

    I've added this IP to my server's "deny" list.

  15. #15
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    Just a warning to those who only ban unwanted bots/scrapers by Useragent.

    Prior to the 7-5-2005 every bot or scraper I caught I captured the Useragent since the 7-5-2005 I have caught another 139 bad bots/scrapers that do not follow the robots.txt of them, 69 were only identified by their IP'address their Useragent was not able to be captured, so to my simple way of thinking if any of these are well known bad bots that have already been banned by their Useragent and not their IP I think they will still be able to access your sites.

    If anyone has any different thinking/reasoning than me on this I would be interested in their point of view.

    I've updated my list. You can see the number of hidden Useragents caught to date.
    One day parasites and their ilk will be made illegal, I bet a few Lawyers will be pissed off when the day comes.
    Mr. Spitzer is fetching it nearer

    YouTrek

  16. #16
    Member
    Join Date
    April 18th, 2005
    Posts
    81
    YahooSeeker/1.2 (yahooseeker at yahoo-inc dot com ; http://help.yahoo.com/help/us/shop/merchant/)

    You block Yahooseeker and BecomeBot? I do get referrals from Become though they're basically a new search engine.

    As for Yahooseeker, I thought that was Yahoo's version of Froogle? So why block it? Kind of drastic isn't it?

    I was so happy to be in Froogle which provides me with so much free leads until they removed me after realizing that I wasn't a merchant but an affiliate.

  17. #17
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    noxtrumbot (noxtrum.com)
    Non-compliant bot: Requested robots.txt but indexed all files in the "Disallow" directories.

    194.224.199.47 noxtrumbot/1.0+(crawler@noxtrum.com)

  18. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Featured: Ban Bad Bots And Countries
    By BurgerBoy in forum Programming / Datafeeds / Tools
    Replies: 20
    Last Post: August 8th, 2013, 04:36 PM
  2. Richest bad marketer on the forum
    By tomcam in forum Introduce Yourself
    Replies: 9
    Last Post: January 24th, 2009, 03:10 PM
  3. Bad Bots Help
    By reaper in forum Spam
    Replies: 9
    Last Post: February 6th, 2006, 08:59 PM
  4. Current bad bots htaccess list?
    By Andy in forum Spam
    Replies: 6
    Last Post: August 17th, 2004, 08:31 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •