Results 1 to 5 of 5
August 31st, 2012, 12:21 PM #1Removing Indexed Pages from Google
superCool has recently been reading more about robots.txt and the robots meta tag. This is what he’s learned – do you agree?
If Google already has a page indexed and you add a disallow to robots.txt for the page, Google might still show the page in the SERPs. It won’t crawl the page anymore, but it might show in the results – often with a customized/made-up title and snippet (or no snippet).
If you add a meta robots tag with noindex, Google will not show the page in the results (superCool read this on a google webmaster page and elsewhere).
If you add BOTH robots.txt and meta, Google will never see the robots meta tab because it will not crawl the page again (based on robots.txt). So in this case the page might remain in the results.
superCool does not understand the distinction between robots.txt and the meta tag (why treat them differently), but this was spelled out clearly on the google pages Using meta tags to block access to your site - Webmaster Tools Help & Block or remove pages using a robots.txt file - Webmaster Tools Help (see 2nd gray box).
People (including superCool) often complain about Google not removing pages that have been disallowed. Maybe this is one of the reasons? If you disallow using robots.txt then Google might continue to display.
Thoughts? Is this common knowledge?
August 31st, 2012, 12:37 PM #2
Should you put a disallow in robots.txt, the following will occur as the description for your page:
A description for this result is not available because of this site's robots.txt – learn more.
The page more than likely will never be removed because the header will return a 200 code.
Once the page is indexed there are three ways to get it removed:
- Make that page return a 404 error (not found)
- Make that page return a 410 error (gone)
- Manually remove it in WMT (best method)
What you have to be careful with is if your site generates sitemaps dynamically - bots will be very confused why you are putting a URL in there that is blocked (one way or another).
Adding noindex to the meta tag will stop the bot from further attempts to index and SHOULD, eventually, remove it from the SERPs. However, personally have had to manually remove those pages in WMT.
Clear as mud?Salty kisses, Sandy toes, and a Pirate's heart...
August 31st, 2012, 01:01 PM #3
- Join Date
- April 6th, 2006
Convergence knows of what he speaks... heed his wise words above
If the pages are still on your site (category listings, etc), noindex is the way to go (not robots.txt).. it may take a while, but they will eventually disappear.
August 31st, 2012, 01:02 PM #4
superCool often attempts both robots methods to remove pages - adds the robots.txt disallow and also the meta tag to each page. the "news" to superCool is that this will not work, because the bot will not see the meta tag. that bit is the new twist for superCool. your robots.txt can ruin your attempt to remove the page with a meta tag.
although to be honest, superCool did not even know there was a difference between robots.txt and meta until recently
on some of superCool's sites he has added robots.txt disallow to certain list pages, and also the meta tag. these pages seem to stay around forever in the SERPS. next attempt will be removing the robots.txt and using the meta only, to see if they go away. once gone, adding the robots should be ok. or so it seems
August 31st, 2012, 01:15 PM #5
On a recent re-launch of a site to a new platform we had 240K indexed pages. That was June 10, 2012.
Today there are about 75K of those old URLs remaining. All OLD URLs go to 404 pages. By reviewing daily stats those 404s that are hit by users can then be filtered and manually removed in WMT.
In order to manually remove URLs in WMT, you must have either a disallow in your robots.txt file, in your header meta tag (there are body metas, too), or return a 404 or 410 header code.
Don't beat your head against the wall. Check your stats weekly, pull out the 404s and manually remove them...Salty kisses, Sandy toes, and a Pirate's heart...
By IntegratedS in forum Search Engine OptimizationReplies: 4Last Post: February 3rd, 2012, 01:51 PM
By JasMate in forum Midnight Cafe'Replies: 2Last Post: September 21st, 2006, 03:59 AM
By travisbickle100 in forum Search Engine OptimizationReplies: 13Last Post: January 3rd, 2006, 04:04 PM
By PreacherMan in forum Search Engine OptimizationReplies: 5Last Post: October 4th, 2005, 11:55 AM
By Celicaphile in forum Search Engine OptimizationReplies: 13Last Post: September 29th, 2002, 03:00 AM