Page 1 of 2 12 LastLast
Results 1 to 25 of 36
  1. #1
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Google WMT - redirect vs reindexing question
    I don't want to derail bibby's thread but have a similar issue related to Webmaster Tools http://www.abestweb.com/forums/searc...od-154275.html

    A few weeks ago (with the help of my developer pal), my site urls were rewritten - to give some background, last year I had attempted htaccess redirects to shorten custom php scripts (built by my developer), and majorly screwed things up. It was one of the reasons my site dropped in ranking - urls were a mess, with both short & long form navigation on the same page (classic case of duplicate content). Spent the last few weeks cleaning it all up - now you can't browse a bad url, they all redirect to a proper clean format.

    But I just logged in to WMT, and received the dreaded "Googlebot found an extremely high number of URLs on your site" message. Again.

    It seems that the old url nicely redirects in the browser (using php), but not as a 301 (which would tell google to drop the page), even though the header was stamped as 301 in the php file. Which leaves .htaccess as the only solution.

    Trying to configure .htaccess has cost me too much time already - should I just 301 the old (bad) URLs to a fresh new default page..?

    The new convention has started to get indexed, I'm just trying to drop old urls.. it would be nice to translate the parameters, but I'm worried about creating even more of a mess!
    Last edited by teezone; March 14th, 2012 at 06:27 AM.

  2. #2
    ABW Ambassador kse's Avatar
    Join Date
    November 29th, 2005
    Posts
    2,511
    All I can say that I use "301 redirects" when I remove a page or RENAME a page and so far I never had a problem in that area.

    Like you I spend most of my time lately fixing up other problems found in Google Web Master, I have been spend most of the last 4 weeks fixing things to make Google Happy.

    I just wondering how much time people are spending fixing up problems that Google are reporting instead of doing other affiliate work??

    Good luck!!
    MERCHANTS: Start showing your coupons directly on your site, that way your shoppers will stop leaving your site looking for them!! If not then remove your Coupon Box!!

  3. #3
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Thanks!

    It seemed my php solution was incomplete.. the browser shows the new (clean) url, but the response code is still 200, despite the correct 301 header in my code.

    Very weird... and I can't find any reports of a similar problem.

    I would love to 301 every old url (with assorted parameters), but one default page will get rid of this problem quickly.. was just wondering if anyone had thoughts on this..

  4. #4
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Quote Originally Posted by teezone View Post
    I would love to 301 every old url (with assorted parameters), but one default page will get rid of this problem quickly.. was just wondering if anyone had thoughts on this..
    Would suggest NOT to send all the old pages to one default page via 301. Actually best if you direct each page to it's new page or a 404 page.

    If you're dealing with URL parameters, there is a spot in both the Google WMT and in Bing's WTB where you can tell them how to treat specific URL parameters. Don't know if this will work in your case, but wanted to throw it out there.
    Salty kisses, Sandy toes, and a Pirate's heart...

  5. #5
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Thanks Convergence, I appreciate the advice..
    best if you direct each page to it's new page or a 404 page.
    Redirecting each page has turned into a pipe dream.. I've spent too many sleepless nights trying to tweak htaccess, but will give it one last try today.

    I've now directed old urls to a 404 page - it's similar to my 301 landing page layout, but with the proper 404 header, fully tested.

    This has been a nightmare.. I've accepted the fact I took a hit in G for the mess, but trying to fix it properly has taken weeks.

  6. #6
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Update: I decided to revisit WHY my php redirect wasn't working - browser url looked fine, but it was missing the all-important 301 for google.

    Used this tool: View HTTP Request and Response Header

    It showed me header error messages at the top of the page... and with the help of my developer pal, the error was fixed. Now have proper 301s in place for all the parameters!

    Thanks Convergence for your insight, I learned something new today re: 404 vs 301

  7. Thanks From:
    kse

  8. #7
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Salty kisses, Sandy toes, and a Pirate's heart...

  9. Thanks From:
    kse

  10. #8
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    As an update, I'm still seeing Google crawl pages that were 301'ed more than a month ago...

    Just to recap, I had navigation issues on a section of my site that spawned hundreds of thousands of urls, all considered duplicate content. 5 weeks ago, I added the proper 301 to redirect the URLs that I wanted to remove/consolidate.

    I didn't expect the problem to be sorted overnight, but I'm now considering a disallow in robots.txt to the old URLs. However - in theory - if you disallow a problem area on your site, Google won't see that it has been fixed.

    Does anyone have an opinion how long one should wait before applying the disallow? There are no internal links to the old area that caused the problem... the only traffic is Googlebot..

    I'm finding it tough to be patient, any input would be appreciated!

  11. #9
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    You've done your job.

    You've told the Google and every other bot / human that the old links redirect with a 301.

    You have new pages in your sitemap and the OLD links are not, correct?

    Are the old pages still indexed?

    The Google doesn't believe anyone. They will index and index and index forever. Even if you put in 404 redirects...
    Salty kisses, Sandy toes, and a Pirate's heart...

  12. #10
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    You have new pages in your sitemap and the OLD links are not, correct?
    Correct, but that sitemap is not being addded to the index.
    Are the old pages still indexed?
    Yes.. what's there is 50% old, 50% new (but 90% gone).

    Most pages were actually deindexed (which I understand).. my CMS content is fine, but this was a custom area that added value to my SEO (and to visitors) - it has some unique features that were developed without any canned scripts. I also found (last week) that my developer didn't properly code 404s with the apache header.

    It has been a perfect storm - duplicate content and soft 404s.

    I'm assuming both issues negatively impacted my site, and want G to see the problem has been fixed, so indexed pages can be restored.. right now, it feels like I have lost 2 years of hard work. And it's making me reconsider my efforts..

    Added: The "old" URL has its own php file, which is redirected to a new php file, with correct handling - just to clarify, I'm thinking of disallowing the old php file.

  13. #11
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    You're not dead unless you don't see them indexing.

    Had two sites we neglected because the merchant changed the way their data feeds were constructed. Had not done anything with the sites for about six months. The Google stopped coming over to play and pages fell out of indexing. In early February finally found two niche merchants to replace products on one of the sites. The Google returned with a vengeance and this month it's our number ONE traffic site (and we have just a "few" sites).

    Did the same with the other site earlier this month, and already traffic is starting to come back.

    Patience It will recover...
    Salty kisses, Sandy toes, and a Pirate's heart...

  14. Thanks From:

  15. #12
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Thanks for the wisdom, much appreciated!

    So you don't suggest I disallow the old (bad URL-causing) php file in robots.txt?

    G still shows links to one page from (potentially) 8-10 bad URLs (generated by the old php file) - they are slow to acknowledge 301s, and I'm worried this coding error will cost me 2-3 more months, which I don't have..

    I wish my problem was a neglected site that didn't do anything wrong unfortunately I'm stuck with faulty but honest coding errors that caused a penalty.

  16. #13
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Only reason, personally, for not rushing to kill off the old pages is because you lose the possibility of a human not finding the old links and getting to your new pages.

    Wouldn't block the old pages in robots.txt - that really seems to piss off the Google, LOL. Would send those pages to a 404 page after waiting a little longer...
    Salty kisses, Sandy toes, and a Pirate's heart...

  17. #14
    ABW Ambassador ladidah's Avatar
    Join Date
    October 15th, 2007
    Location
    MA
    Posts
    1,888
    Quote Originally Posted by teezone View Post
    So you don't suggest I disallow the old (bad URL-causing) php file in robots.txt?
    I would suggest not to disallow in robots.txt since the page has already been indexed. If you do, you'll risk getting pages with only the url as title with no descriptions since Googlebot will then be disallowed from entering the door and stopped cold by robots.txt. It will then use the url as title but give the page a blank as description. If you have many of these pages you will end up with many of these blanks and hence into the supplemental index. Although you may not have any links to these pages, Google has those pages in their dinosaur archives/datacenter and will link to these pages and will remember them for a loooong time.

    Convergence is on to something, you can in your WMT tell google to ignore a certain parameter and if you have many of them, can inflate your number of urls. If you have many urls, it may take google a while to crawl them all. So one of the simple solutions, is to tell them to ignore. However, not sure what nature your urls and parameters were and the redirect so hard to tell from what you describe.

  18. #15
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    I just wanted to re-open this thread - it has been another 3-4 weeks.

    For two months, my site has been properly redirecting old (datafeed) URLs (list.php) to new, clean, canonical pages (list-clean.php). What's interesting here is that Google continues to crawl old URLs with a vengeance.. sure, it's mixed in with the new structure, but the old URL appears to be infinite in combinations (I'm watching googlebot in the domlog, and see every 301).

    Since these are not content pages (and wouldn't have very many backlinks), do you think it's time to 404/410 the old file..?

    I'm at the end of my rope here - everything has been fixed, and tests show navigation is clean as a whistle. The recovery is just not happening, even though I continue to publish content 4-5 times/day...

    My patience is running out, but I don't want to do anything stupid - any suggestions welcome!

  19. #16
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    You can 404/410 them - the Google will spend another month or two trying to index them. It seems, as of late, to be resistant to dropping anything indexed. Whereas when this thread started I saw the Google doing a much better job at cleaning out 301s/404s.

    Noticed this myself, just yesterday, on one of our sites that has 240K pages indexed in the Google. However, only have 109K products on the site. All the other pages are product pages that no longer exists.

    I'm not bothered by it other than it is sending unqualified traffic to the site. Which just adds fodder to "the Google's search results suck".

    Really going to piss off the Google over the weekend. Launching a redesign on that very site. Everything has changed including URL & Category structure. We will be able to do redirects on the categories, but the product URLs will suffer.

    So be it...
    Salty kisses, Sandy toes, and a Pirate's heart...

  20. Thanks From:

  21. #17
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    You can 404/410 them
    Going, going.. GONE!

    I just deleted the original file.. to be honest I can't see it getting any worse, and feel the need to wipe the slate clean. Quality backlinks to content won't be impacted, and perhaps the Google (as you call it!) will finally see that the site problems are fixed.

    Thanks for the feedback, it's much appreciated - and good luck with the redesign!

  22. Thanks From:

  23. #18
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    So here we are, 12 days later, and there is no change in Google - the original file (now returning 410) continues to be spidered like a bat out of hell. They appear to be crawling endless combinations of parameters, but still haven't acknowledged the base file is gone.

    I've also launched Phase 2 of the project - as part of the rewrite process, a new group of clean URLs were created (with canonical tag, etc), and some of these are duplicate blocks. I had added the folders to robots.txt a few days ago, but just figured out how to "noindex" this subset.

    I'll report back on how long it takes the "noindex" to appear (or disappear, as the case may be!).

  24. Thanks From:

  25. #19
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Another update on my 301 situation..

    After 3 months, the old php file has finally been dropped from the index... almost FOUR weeks after it started returning a 404. It was coming & going in the past week, but as of today, every search only returns a few instances.

    I can't believe it took this long, but suspect it had more to do with the multitude of parameters that had been spawned (and crawled) over the past year. Regular content 301s have usually been picked up within a reasonable time.

    If I had to do this over again, I might have skipped the 301 and gone straight to 404. Right now, a rule of thumb for complete removal is around the 30-day mark, for anyone going through the same situation!


  26. #20
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Quote Originally Posted by teezone View Post
    If I had to do this over again, I might have skipped the 301 and gone straight to 404. Right now, a rule of thumb for complete removal is around the 30-day mark, for anyone going through the same situation!
    Just completed a complete site overhaul on Monday for a website that had 240,000 pages indexed in the Google. We opted to forgo attempting any redirects and pushed everything to a really nice 404 page - we no longer have 240,000 pages indexed - now showing 211,000. Of those, nearly a thousand are new links...
    Salty kisses, Sandy toes, and a Pirate's heart...

  27. Thanks From:

  28. #21
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Good to know.. and a good decision on your part!

    It probably didn't help my case that I had quite a few leftover content links TO the old file (thought I had done a complete search/replace, which was not the case).

    It's a great feeling to see clean URLs in the index.. too soon to know if it will help my site's recovery, but it's a start.

  29. #22
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    good to hear that your indexed pages are finally getting cleaned up. next up.... super traffic (we hope )

  30. #23
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    It's time for another update!

    The bad URLs appear to have left the index, although I do see G continue to crawl a healthy amount of those pages. The 404 is firmly in place, so I'm guessing this is just residual noise that will dwindle over time.

    I haven't received major credit for these changes yet, but one positive side-effect is that new changes are getting picked up faster. It makes sense as there is less clutter for G to sift through, but the increased speed is quite noticeable. For example, I changed some code on a Joomla template that impacted 500+ pages, and within 10 days, there is now only 1 page left in the index with the old code.

    The problem here is patience.. you want to see immediate results, but that just doesn't happen anymore. There is lots of good advice out there re: 301 vs 404, but if I had to do it all over again, I would have deep-sixed the entire section (ie. 404). Sometimes it is better to wipe the slate clean...

    And in a weird way, this has been a terrific learning experience!

    Hopefully this thread will help someone else going through the same..

  31. Thanks From:

  32. #24
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Quote Originally Posted by teezone View Post
    Sometimes it is better to wipe the slate clean
    Yep. Did just that week ago, Monday.

    Started with 240K indexed URLs.

    Today: 197K

    Out of the 197K, 18,151 are new and traffic is already picking back up...
    Salty kisses, Sandy toes, and a Pirate's heart...

  33. #25
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Whoa, Nelly.. Google just sent me a message with the dreaded "Googlebot found an extremely high number of URLs on your site". And most of those URLs reference the "bad" file that I was trying to fix in this thread.

    On May 18th, I wrote:
    Going, going.. GONE!
    I just deleted the original file..
    Why on earth would G be sending me that warning, when they should ALL be 404s..? AND, using the "site" parameter, there is only ONE url in the index that contains the old syntax. So I receive a warning about URLs they don't even have in the index anymore?

    I also just double-checked Apache logs - Googlebot correctly sees the 404 header.

    There were other URLs listed that have also been fixed via canonical links (ie. they all redirect to a master page, correctly).

    Something has happened in the index - a kind of reset, perhaps. Anyone else seeing unusual activity/results?

+ Reply to Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. Google changed the home layout in WMT
    By teezone in forum Search Engine Optimization
    Replies: 3
    Last Post: September 30th, 2011, 12:55 AM
  2. Question with Bounce redirect
    By shariOWC in forum BounceLinks.com
    Replies: 3
    Last Post: March 7th, 2008, 07:52 PM
  3. URL Redirect Question - I Need Help.
    By HumbleFish in forum Commission Junction - CJ
    Replies: 12
    Last Post: March 22nd, 2004, 09:19 AM
  4. google still reindexing????
    By MsMarySunshine in forum Search Engine Optimization
    Replies: 0
    Last Post: May 3rd, 2002, 10:12 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •