Results 1 to 10 of 10
  1. #1
    ABW Ambassador Daniel M. Clark's Avatar
    Join Date
    January 7th, 2006
    Location
    Houston, TX
    Posts
    2,082
    AWStats Question About Bots
    I'm using AWStats on my server, and I've noticed something very strange that I'm hoping someone can explain.

    Actually, I just now noticed that AWStats says there were 363 more uniques (1097 vs. 734) than Google Analytics, but that's not my main question.

    I noticed this line in the bots section:

    Unknown robot (identified by empty user agent string) | 25379+35 | 1.07 GB | 25 Jan 2010 - 22:47

    Those first two values are the number of visits and the amount of bandwidth sucked up. Now, I've got unlimited bandwidth, so I'm not super-concerned, but that does seem very, very excessive. GoogleBot, by way of comparison, only sucked up 6.51MB of bandwidth.

    So, my question is this... should I be alarmed? If not alarmed, is there at least something I should be doing about it? My host can't be happy with over a gig of bandwidth given to bots (apparently) in just about 27 days.
    Daniel M. Clark
    Tech Manager
    Greg Hoffman Consulting

  2. #2
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    Did you get that rawlog viewer going in Awstats? You can get a lot of details looking at that beyond the Summary view. If it's a bad bot you can try to block it with a trap. If it's good you can see how to add to robots.txt to block it.


  3. #3
    ABW Ambassador Daniel M. Clark's Avatar
    Join Date
    January 7th, 2006
    Location
    Houston, TX
    Posts
    2,082
    I do now
    I just got that up and running from the instructions you posted in the other thread - I was out of town at the end of the year and dropped the ball on finishing the setup.

    If all I have is "unknown bot" and "empty user agent string", can I do anything? Are there valid reasons why the user agent string would be empty?
    Daniel M. Clark
    Tech Manager
    Greg Hoffman Consulting

  4. #4
    ABW Ambassador ladidah's Avatar
    Join Date
    October 15th, 2007
    Location
    MA
    Posts
    1,888
    Quote Originally Posted by Daniel M. Clark
    Actually, I just now noticed that AWStats says there were 363 more uniques (1097 vs. 734) than Google Analytics, but that's not my main question.

    Unknown robot (identified by empty user agent string) | 25379+35 | 1.07 GB | 25 Jan 2010 - 22:47
    So is that per month the number of uniques via Awstats? 1097 is that per day? or per month?

    I looked in my awstats too and saw the same referrer:

    Unknown robot (identified by empty user agent string) | 5922+97 | 195.48 MB | 27 Jan 2010 - 00:12

    So it seems that that bot is chewing lots of bandwidth compared to my numbers if it's the same bot.

    Perhaps they are interested in media like videos or mp3 files, or it's more than one bot coming back several times from different IPs. I have a slew of them banned but some still go through.

  5. #5
    ABW Ambassador Daniel M. Clark's Avatar
    Join Date
    January 7th, 2006
    Location
    Houston, TX
    Posts
    2,082
    That was for the month of January so far. I don't have any media files on my site... I don't think. Now that you mention it though, I think my images directory has some Photoshop files in it, and those can be pretty large. Still, they're not linked from anywhere at all, so unless the bot is just going in and taking everything regardless... do they do that?
    Daniel M. Clark
    Tech Manager
    Greg Hoffman Consulting

  6. #6
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    Unknown robot (identified by empty user agent string) | 25379+35 | 1.07 GB | 25 Jan 2010 - 22:47
    I have only 24 hits in this field and 29863+29 for Googlebot.

    Could you find the culprit in your rawlog? You can use the filter box and filter for robots.txt since your unknown bot hit that 35 times. Increase your "max lines" to get all the lines. Now you can scroll down and see all the empty user agents and see if it's all one IP which you can block.

    I use the Find feature for FireFox and search for ("-" "-") with out brackets. This brings up all with no referrer and no user agent.


  7. #7
    ABW Ambassador Daniel M. Clark's Avatar
    Join Date
    January 7th, 2006
    Location
    Houston, TX
    Posts
    2,082
    Hm. I put robots.txt into the filter, which is set at 50000 lines, and it only pulled up 28 results - all of which seem to be legit (GoogleBot, MSN and a few others I recognize). When I entered "-" "-" it didn't filter anything at all... I actually got the same results list that I got when I next did an empty search.

    Thanks for your insight on this stuff, John. It's very, very informative.
    Daniel M. Clark
    Tech Manager
    Greg Hoffman Consulting

  8. #8
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    When I entered "-" "-" it didn't filter anything at all...
    You missed that I said to do that Find in Firefox while you have your filtered log results open. It's like filter the filter.

    Seems like you should have more hits than 28 on robots.txt just from the desired bots.

    Quote Originally Posted by From Awstats
    * Robots shown here gave hits or traffic "not viewed" by visitors, so they are not included in other charts. Numbers after + are successful hits on "robots.txt" files.


  9. #9
    ABW Ambassador ladidah's Avatar
    Join Date
    October 15th, 2007
    Location
    MA
    Posts
    1,888
    Quote Originally Posted by Daniel M. Clark
    Now that you mention it though, I think my images directory has some Photoshop files in it, and those can be pretty large. Still, they're not linked from anywhere at all, so unless the bot is just going in and taking everything regardless... do they do that?
    Quite possible that some bots have stumbled onto that directory. I would disallow that directory in robots.txt and if somehow those images have been hotlinked already, I would stop that in .htaccess.

    Quote Originally Posted by John
    Did you get that rawlog viewer going in Awstats?
    John, could you point me in the right direction as to how you do that? Are you referring to the awstats plugin? If so, any danger for hackers regarding permissions / vulnerability in that directory? I was looking online and saw a few articles regarding that issue.

  10. #10
    ABW Ambassador Daniel M. Clark's Avatar
    Join Date
    January 7th, 2006
    Location
    Houston, TX
    Posts
    2,082
    I double checked and I don't have any large files anywhere... I must have cleared them out at some point. I've been busy with other stuff, but I'll revisit this over the weekend and see if I can figure it out some more.
    Daniel M. Clark
    Tech Manager
    Greg Hoffman Consulting

  11. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Awstats and WP-Admin
    By suzie250 in forum Programming / Datafeeds / Tools
    Replies: 2
    Last Post: September 24th, 2009, 12:47 AM
  2. AWSTATS Question
    By Kevin in forum Midnight Cafe'
    Replies: 6
    Last Post: August 8th, 2005, 03:31 PM
  3. webalizer vs awstats
    By Steveinid in forum Search Engine Optimization
    Replies: 3
    Last Post: August 24th, 2003, 03:17 PM
  4. Awstats in real time?
    By Javi in forum Midnight Cafe'
    Replies: 9
    Last Post: January 31st, 2003, 02:24 PM
  5. AWSTATS install help
    By ken in forum Programming / Datafeeds / Tools
    Replies: 2
    Last Post: September 16th, 2002, 09:31 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •