Results 1 to 3 of 3
  1. #1
    Full Member
    Join Date
    January 18th, 2005
    What's with
    Over the last two days I have received several thousand searches from a robot heritrix from I went to their site and found a single page with a sales blurb that sounded like a sales pitch from the Internet Bubble days. Am I the only lucky person? I can ban them but usually in cases like this I don't - even Google started scanning their first site once...


  2. #2
    ABW Ambassador
    Join Date
    January 18th, 2005
    2,511 is a Japanese language site. It's also registered to a Japanese company.

  3. #3
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Bayou Country, LA
    Heritrix started hammering one of my sites tonight. Must be about 20 different IPs got blocked by my bad spider zapper.
    The only way that happens if they disobey my robots file.

    It looks like it had innocent enough beginnings as open source project at SourceForge.
    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
    Now it looks like it's up to no good.

  4. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Replies: 2
    Last Post: June 5th, 2009, 02:29 PM
  2. Replies: 7
    Last Post: July 30th, 2007, 08:25 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts