Results 1 to 16 of 16
  1. #1
    Member genbintn's Avatar
    Join Date
    November 29th, 2005
    Location
    Lincoln, NE
    Posts
    70
    Googlebot Info Please
    Since I am under the impression that this particular forum isn't a rant area over google and their spidering websites, could someone please point me towards a good forum?

    My husband is threatening to bill Google and Yahoo both for bandwidth consumption because their spiders are hitting our websites everyday over 2,000 times (these are not affiliate websites, but those we host). Wouldn't be so bad but they spider the same pages, over and over and over and over again.

    TIA for any leads,

    Bridgett

  2. #2
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    That's actually a good thing. You want Google to spider your pages as often as possible so they have the most up to date information from your site. If you're updating daily, Googlebot will come daily for the updates. Bandwidth is cheap, so I don't think you need to worry about that. What exactly is the problem with Googlebot's behavior?

    You could always control Googlebot with a robots.txt file, but try not to limit it too much otherwise you'll hurt your traffic from Google.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  3. #3
    general fuq mrbshouse's Avatar
    Join Date
    January 18th, 2005
    Location
    Argieville
    Posts
    1,381
    You can also add meta tags to tell the spiders how often to visit. This may or may not help you since you have to add it to each page and they may not always look for this tag.

    If you have that much traffic you can always ban the spiders in your robot.txt and that should save you a bunch of traffic...from spiders and visitors

    good spiders=visitors ;-)

  4. #4
    ABW Ambassador DesignerWiz's Avatar
    Join Date
    January 18th, 2005
    Location
    U.S.A
    Posts
    2,777
    You can also tell the bot how often to revisit particular pages by using a google site map. http://www.google.com/webmasters/sitemaps/login
    Ray Thomas
    Webmaster Resources: http://DesignerWiz.com
    ABW Board Category: Programming / Coding
    http://forum.abestweb.com/forumdisplay.php?f=190

  5. #5
    Resident Genius and Staunch Capitalist Leader's Avatar
    Join Date
    January 18th, 2005
    Location
    Florida
    Posts
    12,817
    Quote Originally Posted by genbintn
    Since I am under the impression that this particular forum isn't a rant area over google and their spidering websites, could someone please point me towards a good forum?

    My husband is threatening to bill Google and Yahoo both for bandwidth consumption because their spiders are hitting our websites everyday over 2,000 times (these are not affiliate websites, but those we host). Wouldn't be so bad but they spider the same pages, over and over and over and over again.

    TIA for any leads,

    Bridgett
    Wha?! We rant about Goofle a lot (note my spelling of their name!). Whoever gave you the impression that they're on a true pedastal here is mistaken. Perhaps s/he thinks this place has mutated into WMW~? Just as long as you don't threaten to blow up the place or something like that, or make too nasty of personally-directed comments (like saying, Larry Page can go and [insert obscene or violent suggestions]), it should be okay...

    On the other hand, that doesn't mean we'll *agree* with just any rant. No forum will guarantee you its agreement...

    Anyway, you can ban G-bot with robots.txt. But, 2000 times shouldn't eat that much bandwidth!

    Plus, any hosting customer with even 1 neuron will leave a host that dares to ban Googlebot or any other big engines' bot (Yahoo, MSN). Big engines bots bring in traffic (once the pages get in the index)! To ban Googlebot and the others from a site, is to doom that site to obscurity in the engines, which in most cases, is the same as dooming them to obscurity period.

    Tell your husband he's being pennywise and pound foolish. A #1 in Google can be worth several thousand dollars/month (if it's under a good keyword/phrase)! And without the Googlebot's spidering, you've got no chance of getting that, or any other, rank.
    There is no knowledge that is not power. ~Hemingway

  6. #6
    Member genbintn's Avatar
    Join Date
    November 29th, 2005
    Location
    Lincoln, NE
    Posts
    70
    Adding clarification to my initial post
    Hubby and I run a small hosting service, mainly genealogy sites (and of course some of my affiliate sites as well), so he is able to keep tabs on what is going on with the servers and the sites on a daily basis. (You can't believe the spam messages we don't get because of this diligence.)

    So what he is seeing is Google, MSN, Yahoo and a few other search engines slamming the websites every day. As a for instance, from 12:01 am to 4:11 PM CST google has spidered 9,483 times on just one website alone. This website is hit by google EVERYDAY and very rarely has any new information on it (I know, I haven't had time to update it). Since he is helping me write up these stats and looking over my shoulder, he said "and yahoo just !@#$ lives over here."

    We use the robots.txt file, and have heavily excluded a lot of areas in that one website. It has reduced the number of pages being spidered, but not the frequency of their visits. I'd love for them to come every other day, or every 3rd day. Just everyday seems sooooooooooooooo excessive in my book. With all this griping, we have no intentions of banning search bots, would just love to be able to control them a tad more.

    What is so funny, they hit the above site repeatedly, but another site I run which does change on a daily basis they have only spidered 150 times today (same time frame) go figure.

    You can also tell the bot how often to revisit particular pages by using a google site map.
    I have just come off that site. I haven't signed up yet, but will do so very shortly. I was hoping that was one of the perks of using the google site map. Thanks for clarifying it.

    Thanks a lot everyone, you have helped a lot (letting me rant helps LOL) if anyone else has any ideas, they will be most appreciated.
    *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
    Bridgett
    [URL=http://www.genealogyforyou.com]The Country Store[/URL]
    Unique Gifts for the Genealogy Buff:
    [URL=http://www.cafepress.com/genealogyforyou]Genealogy For You at CafePress.com[/URL]

  7. #7
    general fuq mrbshouse's Avatar
    Join Date
    January 18th, 2005
    Location
    Argieville
    Posts
    1,381
    If your concerned about the hosted sites, it sounds like you should look into the agreed amounts of bandwidth per account.

    If these are sites that you run affiliate or otherwise, i'm guessing that they are php driven. Have you looked at the pages that the bots are visiting? I'm not talking about stats i mean logs...look into WHAT they are looking at. is it the same page over and over or is it generated pages in say a calander or other.

    How many pages in the site that was spidered 9400 times?

    how much bandwidth are we talking about anyway?

  8. #8
    http and a telephoto
    Join Date
    January 18th, 2005
    Location
    NYC
    Posts
    17,708
    Brigett, there are a ton of people on this board that would love to have your "problem". Take heed to what you are hearing and show hubby this thread, what you are seeing on your site is a GOOD thing, not something to try and control or rant about.
    Deborah Carney
    TeamLoxly.com BookGoodies.com ABCsPlus.com

  9. #9
    Member genbintn's Avatar
    Join Date
    November 29th, 2005
    Location
    Lincoln, NE
    Posts
    70
    Hi all,

    Deb, I *am* thankful for the spider visitations, I know how valuable it is being indexed by Google, or any other search engine for that matter. My only rant is the frequency (aka everyday) of the visits to sites that have a HISTORY of not being updated for weeks, possibly months, at a time. What is the logic of sending a spider to that site on a daily basis?

    mrbshouse

    We have had problems with Google getting caught in a loop on one of the directories we have on about 3 different sites. The directory contains pages generated by a program several of us use to present our genealogy information (aka database). The program generates huge amounts of php pages on the fly. We finally came up with the right verbage for the robot.txt file for the infinite loop to be avoided. This has lessened the load tremendously.

    The site I was using as an example with 9400 times contains 20,000 individuals in the database (each having a separate page generated 'on the fly'). Again it isn't the number of times it hits during a visit, but the *frequency* of the visits.

    It looks like the Google SiteMap program may be our answer in controling the frequency of visits.

    I've just read the last two replies and wish I could answer mrbshouse's question about how much bandwidth we are talking about, but hubby is a very late sleeper and is still snoring (very loudly). I am not the geek in the family and will have to let you know that answer after he has had at least 2 cups of coffee.

    Thanks to everyone for their interest.
    *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
    Bridgett
    [URL=http://www.genealogyforyou.com]The Country Store[/URL]
    Unique Gifts for the Genealogy Buff:
    [URL=http://www.cafepress.com/genealogyforyou]Genealogy For You at CafePress.com[/URL]

  10. #10
    What's the word? Rhia7's Avatar
    Join Date
    January 13th, 2006
    Posts
    9,578
    To control the frequency of robot visits use this in the meta:

    [meta name="robots" content="index,follow"]
    [meta name="revisit-after" content="10 days"]

    change [ to <

    ] to >

    Change 10 to 30 if you'd really like to, but 10 is usually a good number to specify.
    ~Rhia7 -- Remember the 7
    Twitter me

  11. #11
    Member genbintn's Avatar
    Join Date
    November 29th, 2005
    Location
    Lincoln, NE
    Posts
    70
    Quote Originally Posted by Rhia7
    To control the frequency of robot visits use this in the meta:

    [meta name="robots" content="index,follow"]
    [meta name="revisit-after" content="10 days"]

    change [ to <

    ] to >

    Change 10 to 30 if you'd really like to, but 10 is usually a good number to specify.
    Rhia,

    Thanks. I feel like such an idiot not thinking about using meta tags.

    I'll let you know tomorrow, if that had any effect.
    *~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*
    Bridgett
    [URL=http://www.genealogyforyou.com]The Country Store[/URL]
    Unique Gifts for the Genealogy Buff:
    [URL=http://www.cafepress.com/genealogyforyou]Genealogy For You at CafePress.com[/URL]

  12. #12
    Analytics Dude Kevin's Avatar
    Join Date
    January 18th, 2005
    Location
    Rochester, NY
    Posts
    5,904
    Don't feel dumb. There's only 18 million things to know in this business....
    Kevin Webster
    twitter: levelanalytics

    Kayak Fishing
    Web Analytics and Affiliate Marketing

  13. #13
    Member
    Join Date
    January 18th, 2005
    Posts
    135
    Spidering your pages 2000 times a day.... that doesn’t sound right. That would be highly unusual. Have you checked your website's bots info where it tells the search engines how often to visit. Maybe something in there is configured wrong and is causing this.

    Also, you could be getting some of these hits from Google Images. That could easily show up as 2000 hits a day. In fact, one of my sites gets between 1700-2300 hits a day from the Google image search.

    It looks like its a hit from Google because unlike the normal Google keyword search, with an image search Google opens your site within a Google frame and shows on your stats as being Google… If the hits are from Google's image search the entries in your log would look something like this: http://images.google.com/imgres?imgu...pg&imgrefurl=h

  14. #14
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    [meta name="robots" content="index,follow"]
    That needs to be

    <meta name="robots" content="index,follow">

    which is the default, and what the engines will do if even the tag isn't there at all.

    [meta name="revisit-after" content="10 days"]
    Would also need to be

    <meta name="revisit-after" content="10 days">

    IF it were worth anything at all, which it isn't. There isn't one engine that observes that tag, it's a worthless waste of time.

    They all crawl according to their own frequency criteria for different sites, and according to pre-set crawl algorithms. Search engines have "crawl engineers" who specifically work with their crawling, and they're not going to change their set algos for what individual webmasters put in their meta tags.

    There are some parameters that can be put in for some engines (like MSN) to slow down the speed of crawling and hitting files, but there's never been any evidence that any engine customizes their crawl schedules by webmaster instructions.

    Sometimes the back end of some sites can send spiders into some kind of loop, but that's a different story - nothing to do with that tag.

    Added:

    If there is a real problem with how Googlebot is hitting the site, drop Google a line at the appropriate email address and they can check into it. It's on this page:

    Googlebot: Google's Web Crawler

  15. #15
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    Hubby and I run a small hosting service
    It's fine to limit Google crawling your own sites that belong to you, but that isn't necessarily so with sites that you do hosting for.

  16. #16
    Full Member Tech Evangelist's Avatar
    Join Date
    March 16th, 2005
    Location
    Mesa, AZ
    Posts
    374
    <meta name="revisit-after" content="10 days">
    I hate to tell you this, but this Meta tag has been obsolete for more than 5 years. No search engine uses it. That's common knowledge in the SEO industry. Most Meta tags have been so widely misused over the years that search engines ignore most of them.

    There are a couple of things you could do to lessen the GoogleBot bandwidth issue, although I agree that most people consider daily visits from GoogleBot to be a good thing. Perhaps your best tactic would be to steer GoogleBot.

    You can limit access to certain areas of the site using the robots.txt file. The page that webworker pointed out is the best place for this info.Googlebot: Google's Web Crawler

    See also http://www.robotstxt.org/

    The robots.txt file is far more effective than Meta tags. Every site should have one. I have seen search engines repeatedly ignore "nofollow" Meta tags.
    There's good, fast and cheap. Pick any two.
    [url=http://www.topranksolutions.com]Phoenix SEO[/url] :: [url=http://www.tech-evangelist.com/category/affiliate-marketing/]Affiliate Marketing Tutorials[/url]

  17. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Lots of new names. guitarstrings.info companyname.info
    By NameTycoon in forum Midnight Cafe'
    Replies: 0
    Last Post: March 16th, 2006, 01:52 PM
  2. Lingeries.info Beauty-Supply.info Web-Designs.info + more!
    By NameTycoon in forum Midnight Cafe'
    Replies: 0
    Last Post: November 15th, 2003, 10:01 AM
  3. What's this Googlebot?
    By Leader in forum Search Engine Optimization
    Replies: 5
    Last Post: June 24th, 2003, 01:32 PM
  4. Is This Googlebot
    By Abigail in forum Search Engine Optimization
    Replies: 2
    Last Post: June 26th, 2002, 10:51 AM
  5. Googlebot
    By hershey in forum Search Engine Optimization
    Replies: 18
    Last Post: January 16th, 2002, 05:41 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •