Results 1 to 5 of 5
  1. #1
    ABW Ambassador Doc Sawyer's Avatar
    Join Date
    January 18th, 2005
    Location
    Southern California Desert
    Posts
    567
    Please,

    Prior to the last Googel spider, I put a robots.txt file in the parent directory with the following information:


    user-agent: *
    disallow: /links/
    disallow: /zmail.html


    After this update, when I check all pages in my site, Google now shows all the pages in the "linKs" directory and "zmail.html" but does not have any description of the pages like they used to. Half of what I wanted...

    I wanted to just have the pages go off Google altogether. But, I guess if there is no information, there won't be any search results for those pages. So maybe it will work after all.

    But I am surprised. I thought a Googlebot would honor a robots.txt disallow!

    Unless I am missing something...



    "There comes a time in the affairs of a man when he has to take the bull by the tail and face the situation."
    - W C Fields (Tille and Gus)

  2. #2
    ABW Veteran Student Heyder's Avatar
    Join Date
    January 18th, 2005
    Posts
    5,482
    Google has all of their info listed here. You might already know about this but in case you don't here it is.

    http://www.google.com/webmasters/

    hth

  3. #3
    ABW Ambassador Doc Sawyer's Avatar
    Join Date
    January 18th, 2005
    Location
    Southern California Desert
    Posts
    567
    Thanks Heyder,

    I think what I am seeing now is that those particular pages are remaining in the index because they were previously spidered.

    Doc



    "There comes a time in the affairs of a man when he has to take the bull by the tail and face the situation."
    - W C Fields (Tille and Gus)

  4. #4
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    That robots.txt is fine and correct. The trouble with Googlebot is it can obey the robots.txt file and still include disallowed pages, as you have just discovered.

    The way it works is this: Google is perfectly capable of indexing an entire website without ever visiting a single page as it considers the links to the files and can build the index from that.

    Essentially, a robots.txt disallow means that a search engine will not read a file. It doesn't mean that it can't be indexed. The best example is this:

    User-agent: *
    Disallow: /images/

    The search engines will not read the images but they may be included anyway as the engine can simply read the alt. tags.

    Generally speaking, though, the search engines will obey the intent of the robots.txt file and you can expect to see your disallowed pages disappear in the next update.

    Search Engine Positioning - 1 Design 4 Life

  5. #5
    ABW Ambassador Doc Sawyer's Avatar
    Join Date
    January 18th, 2005
    Location
    Southern California Desert
    Posts
    567
    And...

    The wizard speaks

    Thanks Markymark!

    Doc



    "There comes a time in the affairs of a man when he has to take the bull by the tail and face the situation."
    - W C Fields (Tille and Gus)

  6. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Looking for a Good Robots.txt file
    By Doug247 in forum Midnight Cafe'
    Replies: 8
    Last Post: July 24th, 2007, 06:18 PM
  2. editing Robots.txt file
    By KODea in forum Search Engine Optimization
    Replies: 5
    Last Post: August 28th, 2006, 03:02 PM
  3. robots.txt file
    By UncleScooter in forum Midnight Cafe'
    Replies: 2
    Last Post: October 15th, 2004, 06:39 PM
  4. Do I need a "robots.txt" file?
    By unclejesse in forum Search Engine Optimization
    Replies: 20
    Last Post: June 12th, 2004, 11:02 AM
  5. robots.txt file????? HELP
    By john dundas in forum Search Engine Optimization
    Replies: 4
    Last Post: August 18th, 2002, 05:36 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •