Results 1 to 7 of 7
  1. #1
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    Google slamming my server (ignores crawl-delay)
    Today, my 'test server' is getting slammed by Google, which is submitting many requests per second. Since every page request generates a search query across the 1.6-million product database, it's effectively locking up my server.

    Google used to respect the "crawl-delay" setting in robots.txt (as I wrote here on ABW, in March 2007).

    However, Google no longer recognizes crawl-delay, and instead requires that a site register through Google Webmaster Tools in order to select a preferred crawler frequency (see http://www.google.com/support/webmas...n&answer=48620). Of course, that process requires verification, which requires that I access the server to make changes, which of course is excruciatingly slow because Google is pounding my server....

  2. #2
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    You might want to upgrade your server. For that many products you'll need at least 1-2gb of ram dedicated to your database. It's also a good idea to get a multi-core CPU to handle requests faster. If you aren't running a dedicated server, I think now is the time to consider one.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  3. #3
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    It's a freaking TEST SERVER which I do NOT want Google to pound.

    To complicate matters further, the programming of Datafeed Studio blocks my ability to "verify" through Google Webmaster Tools -- apparently the mod-rewrite redirects everything. The index.php file contains no HTML to insert a meta-tag, and after I uploaded the special .html filename Google provided, I found that it can't be loaded -- Datafeed Studio shows the home page instead. And Google refuses to verify: "We've detected that your 404 (file not found) error page returns a status of 200 (Success) in the header."

    So now the ONLY option is for me to go in and block Google completely from my test server.
    Last edited by markwelch; July 24th, 2008 at 03:56 PM.

  4. #4
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by markwelch
    It's a freaking TEST SERVER which I do NOT want Google to pound.

    So now the ONLY option is for me to go in and block Google completely from my test server.
    Sounds like a plan.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  5. #5
    Antisocial Media Expert ProWebAddict's Avatar
    Join Date
    March 25th, 2006
    Location
    Go Daddy
    Posts
    1,109
    Google also choose to ignore my noindex, nofollow and I had to do whatever process they have listed to remove a site from the index. Then to top it off the site said it will only remove it for a few months when I don't want the site in google at all.

    I guess they do what they want these days.

  6. #6
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    The problem with blocking Google entirely (apart from the fact that it might be several hours or as long as a day before Google re-checks the robots.txt file again), is that it prevents me from doing AdWords testing with this server. (I originally thought that "this completely ruins my plans for my next few days of work," but I'll try changing robots.txt to allow just the Google-Ad-Bot and not any others.)
    ___

    Some additional info: The requests did not actually identify any of the Googlebots -- instead, the browser was set to "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" for these requests, but they came from two Google IP addresses (74.125.16.39 and 66.249.85.130).

    Google submitted 4,599 page requests between 14:25 and 15:51 today (Eastern Time). That's an average of about one per second, but really there were many brief bursts and pauses, so that there were frequently intervals during which six to eight new requests were submitted each second, for several seconds; during some one-second intervals, a dozen requests were submitted.

    None of this would be a problem for a server delivering static web pages (which is how I normally prefer to configure my sites), but the test server is delivering all content dynamically, querying the database each time a request is received. And of course any caching is irrelevant since every request from Google is a new unique query.

    I've updated the robots.txt to exclude all agents.
    Last edited by markwelch; July 24th, 2008 at 04:20 PM.

  7. #7
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    Martin (author of DS) provided me with instructions on where to edit an included file to insert the Google-Webmaster-Tools meta tag.

    Unfortunately, after I was able to access the Webmaster Tools for this site, Google informed me:

    > "The rate at which Googlebot crawls is based on many factors. At this time, crawl rate is not a factor in your site's crawl. If it becomes a factor, the Faster option below will become available." <

    In other words, not only has Google chosen to ignore the crawl-delay meta-tag which it used to respect, now it actually invites webmasters to use the Webmaster Tools to set the crawl rate, and then informs us that it will still ignore our crawl-rate request and will do whatever it wants.

    Hopefully, Google will continue to respect the basic robots.txt guidelines, but after this experience I'm not nearly as confident about that as I was yesterday.

  8. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Google Delay Between Browsers
    By shopsat24seven in forum Search Engine Optimization
    Replies: 2
    Last Post: January 15th, 2008, 02:00 AM
  2. Classic Closeouts Ignores their Affiliates
    By Snib in forum Commission Junction - CJ
    Replies: 0
    Last Post: December 26th, 2007, 06:52 AM
  3. CJ Ignores me, then they ignore me gain
    By bfree74 in forum Commission Junction - CJ
    Replies: 16
    Last Post: June 5th, 2007, 12:33 AM
  4. New Server and Google
    By Greg Rice in forum Search Engine Optimization
    Replies: 6
    Last Post: March 16th, 2004, 08:58 PM
  5. Replies: 2
    Last Post: September 8th, 2003, 11:54 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •