Results 1 to 15 of 15
May 31st, 2009, 08:44 PM #1Directories and Pages in URL after index.php?
superCool is doing some SEO research for a company and found a very odd looking situation with their website. Google shows a bunch of copies of their home page with odd bits appended to the end of the file name. for example:
directory1, directory2 and directory3 are valid directories but the sequence is not correct. all of the valid page names for the site are duplicated for each combination of index.php/pagex.php and all the directory combinations as well.
when you click the link on Google it shows the home page but the css and image files are not found so nothing is formatted.
The site has 2 different Flash navigation menus (used separately). Not sure if that is part of the issue but the "flash" directory is always the one that follows index.php and is sometimes duplicated between other directory names.
superCool ran some additional Googles and found another site with the exact same situation. the 2nd site is another client of the same web design firm, so obviously the designer is doing something that is causing this.
looking at the cached versions in Google the dates are spread anywhere from early-April to mid-May. so apparently these are not old pages from when the site was being developed...the problem seems to be continuing.
superCool has to talk about this tomorrow and would like to not sound like a complete fool. Does anyone understand any of this? Is it valid to have a directory and/or page name after another page name in the url (index.php/directory1/page1.php)? Any idea what could be causing these?
thanks for any advice,
May 31st, 2009, 08:59 PM #2
That does seem very odd.
In this case to me index.php looks like a folder name.
May 31st, 2009, 09:19 PM #3
Here's a live example of this, if anyone wants to see it:
That site is powered by MediaWiki, which is a wiki that can be used as a content management system. It could be that the sites you saw were using this software or perhaps a different one that is structured similarly.
The way the URLs are structured, they are using URL rewriting using htaccess or httpd.conf. As it exaplains at http://www.mediawiki.org/wiki/Manual:Short_URL :
MediaWiki's default page addresses looks like these examples:
http://example.com/w/index.php/Page_title (recent versions of MediaWiki, unless using CGI)
http://example.com/w/index.php?title=Page_title (recent versions of MediaWiki, using CGI)
Using the methods below, short webpage addresses can be changed to addresses such as these:
http://example.com/wiki/Page_title (this is the standard, same as in Wikipedia)
http://wiki.example.com/Page_title (not recommended!)
In terms of SEO, it's obviously better to eliminate the index.php layer if possible. I think older versions of MediaWiki had the index.php in the URL, although it could be eliminated by updating the URL rewriting rules. It looks like newer versions have this fixed by default. So maybe your client has to update their software or at least update the rewriting rules.
Let me know if I lost you with that "explanation"....--
May 31st, 2009, 09:43 PM #4
Thanks for that - I'll look into it further to see if that could be it. In superCool's case, all of the pages are identical (the unformatted home page). I don't think this was done on purpose. This is a new site and it's not sophisticated at all (lots of very basic problems).
Here's a better example that looks almost exactly like what superCool is seeing
Here's a couple more -
the page does not show images or styling. if you remove everything after .com you will see the site as it should be.
it seems to be related to flash...??
May 31st, 2009, 10:24 PM #5
If you look at the HTML source code for http://www.lillydawsoncasting.com/index.php and http://www.lillydawsoncasting.com/index.php/flash/about.php and http://www.lillydawsoncasting.com/index.php/flash/objective.php, you will see that the HTML is identical for them all. The reason the latter two pages don't look the same as the first one is that all the href and src values in the HTML are using relative paths ("images/example.jpg") rather than absolute paths ("/images/example.jpg"). And as a result, the images and flash and scripts don't load for the latter two pages and it ends up looking like a stripped down version of the site. But it's all unintentional.
I do think the sites have a URL rewriting rule configured on the server to interpret those funny looking URLs. That's why http://www.lillydawsoncasting.com/index.php/flash/about.php shows some content instead of the expected 404 error page. You could even try http://www.lillydawsoncasting.com/index.php/superCool or http://www.lillydawsoncasting.com/about.php/superCool and it would show the same thing.
For some general information about URL rewriting, see http://en.wikipedia.org/wiki/Mod_rewrite--
May 31st, 2009, 10:26 PM #6
Maybe it's pertaining to absolute vs relative urls.
I got weird error flags on Webmaster Tools for a few urls that also tacked on extra directories onto only a few urls. When I checked it only happened to those that were relative urls and not absolute. I have a few redirects in htaccess so maybe that caused the relative urls not to work. I changed them to absolute and the errors are gone.
At least if you changed the link to the css file it could help.
We don't want SuperCool not to look Super Cool tomorrow. Na-ah.
May 31st, 2009, 11:08 PM #7
yes ghoti you seem to be on to something. superCool goes to the site and keys in index.php/anything and it goes to the same page. so maybe they have this set up for some reason. do you think there is something in the site navigation (maybe relative urls) that is causing google-bot to follow invalid links, but due to the mod-rewrite it finds a page and indexes it?
this is not superCool's site or code, so not sure if he will be able to dig this out. do you suppose there is a band-aid patch for it?
so anyway, the problem is that there are 75 bogus URLs indexed by Google with the home page content. superCool would like to get rid of them.
super thanks to all
Last edited by superCool; May 31st, 2009 at 11:22 PM.
June 1st, 2009, 08:55 AM #8
I had a problem similar to your issue when i was testing / crawling my website ...
The problem was because of a URL link that wasnt coded right.
Check the URLs / Links in the base page and make sure they are all coded correctly for the URL rewrites.OpA! Giasou Ti kanies!
June 1st, 2009, 09:38 AM #9
thanks Magi. are you saying the link on a page was wrong <a href=.....> or was there a problem in the Mod-Rewrite code? If you can share any specifics please PM superCool.
thanks again to everyone
June 1st, 2009, 09:44 AM #10
ladidah, they do use relative urls with ./ etc.. do you mean that you changed them to include your whole url (http://www.xxx.com/page) or did you change them to start at the root (/page)? superCool usually uses root-relative links (/page) and has not had problems. are absolute links better?
superCool does not have access to their Google Webmaster tools yet, but needs to get on there.
June 1st, 2009, 11:49 AM #11
ghoti is correct. Take a look at their htaccess file first and see what is going on.
Do you have access to it?
June 1st, 2009, 01:35 PM #12
If you are saying they are all dupes of the same page(s) and just not showing style I suggest:
Either of these two
1) Delete all those extra pages and do 301's to there correlating pages in the htaccess file
2) If you don't have access to the htacess you can have them no-longer indexed by uploading or editing the robots.txt file and for safety sake edit the calling url for the css file using a full url http://www.site.com/style.css. That will help with the websites "image/reputation" until those pages are no longer being indexed.
I also would like to hear your follow up of what you end up doing, when you do it
June 2nd, 2009, 09:31 AM #13
thanks for all the info. superCool has done some additional research and it looks like it "might" be partially caused by some relative links in the Flash navigation. The Flash is in a different folder than the pages and according to some info superCool found on the web, googlebot might take those links relative to the Flash folder rather than relative to the folder where the page is. still not sure how it finds the home page when it does this?
anyway, superCool sent what he had to the web designer and hopefully they will be able to figure it out. possibly some Flash problems combined with some redirect and mod-rewrite issues (and who knows what else?).
superCool will post the solution when he hears of it.
thanks again for your help peeps.
your friend, superCool
June 2nd, 2009, 09:45 AM #14
Don't forget to have the designer or someone else with access to Google's webmaster tools request that the erroneous pages be removed once you find the problem.
-rematt"I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant." - Richard Nixon
June 2nd, 2009, 10:25 AM #15
thanks rematt - that's on the list. remove from Google and then monitor to make sure they don't return.
By megatonloh in forum Programming / Datafeeds / ToolsReplies: 22Last Post: April 13th, 2007, 06:55 PM
By Pierre (aka Terdef) in forum WebMerge (Fourthworld.com)Replies: 0Last Post: June 30th, 2005, 05:24 AM
By rick_whittington in forum Search Engine OptimizationReplies: 10Last Post: February 10th, 2005, 11:08 PM
By Roland in forum Search Engine OptimizationReplies: 9Last Post: November 21st, 2004, 10:59 PM
By dak142 in forum Programming / Datafeeds / ToolsReplies: 5Last Post: June 29th, 2004, 08:29 AM