Results 1 to 13 of 13
  1. #1
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    MSNbot Hammering my Site
    Lately msnbot has been hitting me *really* hard. We're talking about 2 million pages/month. While I welcome it, it puts a serious strain on my server. This is a good test because it shows me what my server can handle, but it's causing some delays in my updates.

    The table in question is my lookup table. It's being hit with so many selects, updates and inserts that it's struggling. msnbot is hammering it with the selects while my update scripts are hitting it with the updates and inserts. I tried making a copy of this table where msnbot can select from one copy while I can continue to update the original copy. The problem here is I need to sync them up. First I'd have to delete everything from the copy then re-insert it from the original. This could leave a gap in functionality on the site while the syncing is going on, so this solution isn't quite ideal.

    What I'm thinking now is to cache even more data into this table so the real time processing is reduced. I can create a new crontab script to cache this additional information, but it will add more overhead. Another way to go is to cache this additional information in real time. Then only re-cache it when the data is requested after a certain period of time.

    Anybody have any experiences with this? What do you recommend?

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  2. #2
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    I've put some more thought into this and the immediate concern is a bit serious so I've got a couple short term solutions. I can either adjust my robots.txt for fewer spider hits or add another gig of memory to the server.

    I'm leaning toward the latter because ideally I should be able to handle this traffic.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  3. #3
    Resident Genius and Staunch Capitalist Leader's Avatar
    Join Date
    January 18th, 2005
    Location
    Florida
    Posts
    12,817
    Quote Originally Posted by Snib
    I've put some more thought into this and the immediate concern is a bit serious so I've got a couple short term solutions. I can either adjust my robots.txt for fewer spider hits or add another gig of memory to the server.

    I'm leaning toward the latter because ideally I should be able to handle this traffic.

    - Scott
    Ah! English!

    I agree, add the RAM. MSN and the other "good" bots may pay attention to your robots.txt, but you don't want to be caught short if some rogue spider comes barreling through. Even if you ban all rogues upon detection, there are always new ones.
    There is no knowledge that is not power. ~Hemingway

  4. #4
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by Leader
    Ah! English!

    I agree, add the RAM. MSN and the other "good" bots may pay attention to your robots.txt, but you don't want to be caught short if some rogue spider comes barreling through. Even if you ban all rogues upon detection, there are always new ones.
    I just put in the ticket for the extra ram. I hope all this msnbot activity is worth it. They're still weakest in terms of traffic volume. For all the pages they're taking I would hope they make it worth my while.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  5. #5
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    While I agree that adding memory to your server is probably the best solution, have you checked to see whether MSNbot respects the "Crawl-Delay" setting in the robots.txt file?

    When I last checked, I think Google was the only engine to recognize it, but it's been a while.

  6. #6
    Resident Genius and Staunch Capitalist Leader's Avatar
    Join Date
    January 18th, 2005
    Location
    Florida
    Posts
    12,817
    MSN itself probably isn't...I've got some #1s there, and despite that, they hardly bring in any traffic compared to even Y.

    But I've found the value in more RAM most apparent when idiot rogue Chinese and Taiwanese robots manage to make it to my big site past my bank of IP bans. Those things can generate 4 page requests/second (with associated MySQL queries)--and can start rolling about 4AM and not stop until 3PM...this can slow the server to molasses and kill sales until you notice the problem and track it down.

    So overall it should be worth it, not so much in increased sales but as insurance.
    There is no knowledge that is not power. ~Hemingway

  7. #7
    Moderator MichaelColey's Avatar
    Join Date
    January 18th, 2005
    Location
    Mansfield, TX
    Posts
    16,232
    Is your server not very well optimized? I can't imagine a bot hitting it hard enough that a server couldn't handle it. Most bots won't do more than 20 or 30 requests per minute, and any decent server should be able to handle hundreds of requests per minute.
    Michael Coley
    Amazing-Bargains.com
     Affiliate Tips | Merchant Best Practices | Affiliate Friendly? | Couponing | CPA Networks? | ABW Tips | Activating Affiliates
    "Education is the most powerful weapon which you can use to change the world." Nelson Mandela

  8. #8
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by MichaelColey
    Is your server not very well optimized? I can't imagine a bot hitting it hard enough that a server couldn't handle it. Most bots won't do more than 20 or 30 requests per minute, and any decent server should be able to handle hundreds of requests per minute.
    It's optimized pretty well, but I have quite a lot of dynamic content going on. In this particular case I'm running a fulltext search query on every page msnbot hits. It's a special feature that we offer that's worked really well for customer use, but at 40-50 requests/min it's a lot of work for the server.

    I think the first step is to start caching this information in real time for a day or two at a time. That will only make a minimal impact since it will only reduce the real time work, not eliminate it. The next step is to start generating this information via the crontab, but that will require a lot of resources and time to complete. Basically this script will need to run several hundred thousand fulltext search queries every day. If this script can complete in a timely manner we'll be able to handle well over 100-200 requests/min.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  9. #9
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    First this is a useful bit of information:
    Code:
    User-agent: msnbot
    crawl-delay: 120
    
    User-agent: msnbot-news
    crawl-delay: 120
    
    User-agent: msnbot-media
    crawl-delay: 120
    
    User-agent: msnbot-products
    crawl-delay: 120
    Now I didn't realize MSNBot had different names, but after looking at my server logs more closely I can see it's msnbot-products. They're hitting 1 or 2 product pages per second. Sometimes they'll even request 3 in a second. I set my crawl-delay to 5 seconds for "msnbot-products" and am waiting for them to grab it. Should be soon since they've grabbed it 60 times already today.

    So all this time I thought this was plain vanilla msnbot, but now I can see what's going on here. I looked at my stats and sure enough I've been getting traffic from shopping.msn.com. It's just a trickle but from what I can see MSN is showing my products in their search results when they have no paid results of their own. They've even got my prices, product names and thumbnails. Most likely they think I'm a merchant. They've only sent traffic to products with a single price. I looked over a few of their other results and I saw mostly merchants with one other price comparison engine, so maybe they've still got some kinks to work out.

    In light of all of this I've got plenty of good reason to slow their crawler down. I just hope it works!!

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  10. #10
    Not that fat. ReallyBigGuy's Avatar
    Join Date
    July 20th, 2005
    Location
    U wish U knew
    Posts
    745
    I get a lot of hits from msnbot too, but for some reason I just can't get decent rankings on msn. I do fine in google, but not msn.
    But maybe take a look at your robots, and exclude any ancillary pages that you may not want in there??

  11. #11
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by ReallyBigGuy
    I get a lot of hits from msnbot too, but for some reason I just can't get decent rankings on msn. I do fine in google, but not msn.
    But maybe take a look at your robots, and exclude any ancillary pages that you may not want in there??
    I guess the moral of this story is to find out which msnbot is hitting your site.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  12. #12
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    So after adding the crawl-delay to my robots.txt this situation has improved substantially. My load is 10% what it was during msnbot's foray. And better yet my traffic increased quite a bit from Google. I'm not sure if it's due to the speed increase or not. In any event I'm happy to have my site working at peak performance again.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  13. #13
    Resident Genius and Staunch Capitalist Leader's Avatar
    Join Date
    January 18th, 2005
    Location
    Florida
    Posts
    12,817
    And better yet my traffic increased quite a bit from Google. I'm not sure if it's due to the speed increase or not.
    Sometimes that will help, but with such a fast reaction from them, I'd guess it's probably something else behind the G-rise.

    I have noticed a "bonus" rise after speeding up my main site, just not so fast. And, it's not too long-lived. Almost like G just wanted to give me a little reward for the bother.
    There is no knowledge that is not power. ~Hemingway

  14. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. A regular deal site turning into a loyalty/incentive site
    By talentcn in forum Rakuten LinkShare - LS
    Replies: 5
    Last Post: June 22nd, 2010, 11:21 AM
  2. MSNBot name change?
    By weisinator in forum Search Engine Optimization
    Replies: 3
    Last Post: July 9th, 2004, 06:29 AM
  3. quotes requested - convert dynamic site to static html site
    By chrish in forum WebMerge (Fourthworld.com)
    Replies: 3
    Last Post: March 15th, 2004, 05:08 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •