Results 1 to 5 of 5
  1. #1
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    Removing Indexed Pages from Google
    superCool has recently been reading more about robots.txt and the robots meta tag. This is what he’s learned – do you agree?

    If Google already has a page indexed and you add a disallow to robots.txt for the page, Google might still show the page in the SERPs. It won’t crawl the page anymore, but it might show in the results – often with a customized/made-up title and snippet (or no snippet).

    If you add a meta robots tag with noindex, Google will not show the page in the results (superCool read this on a google webmaster page and elsewhere).

    If you add BOTH robots.txt and meta, Google will never see the robots meta tab because it will not crawl the page again (based on robots.txt). So in this case the page might remain in the results.

    superCool does not understand the distinction between robots.txt and the meta tag (why treat them differently), but this was spelled out clearly on the google pages Using meta tags to block access to your site - Webmaster Tools Help & Block or remove pages using a robots.txt file - Webmaster Tools Help (see 2nd gray box).

    People (including superCool) often complain about Google not removing pages that have been disallowed. Maybe this is one of the reasons? If you disallow using robots.txt then Google might continue to display.

    Thoughts? Is this common knowledge?

  2. #2
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Quote Originally Posted by superCool View Post
    superCool has recently been reading more about robots.txt and the robots meta tag. This is what he’s learned – do you agree?

    If Google already has a page indexed and you add a disallow to robots.txt for the page, Google might still show the page in the SERPs. It won’t crawl the page anymore, but it might show in the results – often with a customized/made-up title and snippet (or no snippet).

    If you add a meta robots tag with noindex, Google will not show the page in the results (superCool read this on a google webmaster page and elsewhere).

    If you add BOTH robots.txt and meta, Google will never see the robots meta tab because it will not crawl the page again (based on robots.txt). So in this case the page might remain in the results.

    superCool does not understand the distinction between robots.txt and the meta tag (why treat them differently), but this was spelled out clearly on the google pages Using meta tags to block access to your site - Webmaster Tools Help & Block or remove pages using a robots.txt file - Webmaster Tools Help (see 2nd gray box).

    People (including superCool) often complain about Google not removing pages that have been disallowed. Maybe this is one of the reasons? If you disallow using robots.txt then Google might continue to display.

    Thoughts? Is this common knowledge?
    Assuming a page is indexed and you want it out of the SERPs:

    Should you put a disallow in robots.txt, the following will occur as the description for your page:
    A description for this result is not available because of this site's robots.txt – learn more.
    Page title and URL are still displayed.
    The page more than likely will never be removed because the header will return a 200 code.

    Once the page is indexed there are three ways to get it removed:

    1. Make that page return a 404 error (not found)
    2. Make that page return a 410 error (gone)
    3. Manually remove it in WMT (best method)


    What you have to be careful with is if your site generates sitemaps dynamically - bots will be very confused why you are putting a URL in there that is blocked (one way or another).

    Adding noindex to the meta tag will stop the bot from further attempts to index and SHOULD, eventually, remove it from the SERPs. However, personally have had to manually remove those pages in WMT.

    Clear as mud?
    Salty kisses, Sandy toes, and a Pirate's heart...

  3. #3
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Convergence knows of what he speaks... heed his wise words above

    If the pages are still on your site (category listings, etc), noindex is the way to go (not robots.txt).. it may take a while, but they will eventually disappear.

  4. #4
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    superCool often attempts both robots methods to remove pages - adds the robots.txt disallow and also the meta tag to each page. the "news" to superCool is that this will not work, because the bot will not see the meta tag. that bit is the new twist for superCool. your robots.txt can ruin your attempt to remove the page with a meta tag.

    although to be honest, superCool did not even know there was a difference between robots.txt and meta until recently

    on some of superCool's sites he has added robots.txt disallow to certain list pages, and also the meta tag. these pages seem to stay around forever in the SERPS. next attempt will be removing the robots.txt and using the meta only, to see if they go away. once gone, adding the robots should be ok. or so it seems

  5. #5
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    On a recent re-launch of a site to a new platform we had 240K indexed pages. That was June 10, 2012.

    Today there are about 75K of those old URLs remaining. All OLD URLs go to 404 pages. By reviewing daily stats those 404s that are hit by users can then be filtered and manually removed in WMT.

    In order to manually remove URLs in WMT, you must have either a disallow in your robots.txt file, in your header meta tag (there are body metas, too), or return a 404 or 410 header code.

    Don't beat your head against the wall. Check your stats weekly, pull out the 404s and manually remove them...
    Salty kisses, Sandy toes, and a Pirate's heart...

  6. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. When will the pages be re indexed by Google?
    By IntegratedS in forum Search Engine Optimization
    Replies: 4
    Last Post: February 3rd, 2012, 02:51 PM
  2. Replies: 2
    Last Post: September 21st, 2006, 04:59 AM
  3. Google indexed pages question
    By travisbickle100 in forum Search Engine Optimization
    Replies: 13
    Last Post: January 3rd, 2006, 05:04 PM
  4. Removing product pages from Google
    By PreacherMan in forum Search Engine Optimization
    Replies: 5
    Last Post: October 4th, 2005, 12:55 PM
  5. # of pages indexed by Google?
    By Celicaphile in forum Search Engine Optimization
    Replies: 13
    Last Post: September 29th, 2002, 04:00 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •