Results 1 to 15 of 15
  1. #1
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    Directories and Pages in URL after index.php?
    superCool is doing some SEO research for a company and found a very odd looking situation with their website. Google shows a bunch of copies of their home page with odd bits appended to the end of the file name. for example:

    www.xxxxx.com/index.php/page1.php
    www.xxxxx.com/index.php/page2.php
    www.xxxxx.com/index.php/directory1/directory2/directory1(again)/page1.php
    www.xxxxx.com/index.php/directory1/directory2/directory1(again)/directory3/page1.php

    directory1, directory2 and directory3 are valid directories but the sequence is not correct. all of the valid page names for the site are duplicated for each combination of index.php/pagex.php and all the directory combinations as well.

    when you click the link on Google it shows the home page but the css and image files are not found so nothing is formatted.

    The site has 2 different Flash navigation menus (used separately). Not sure if that is part of the issue but the "flash" directory is always the one that follows index.php and is sometimes duplicated between other directory names.

    superCool ran some additional Googles and found another site with the exact same situation. the 2nd site is another client of the same web design firm, so obviously the designer is doing something that is causing this.

    looking at the cached versions in Google the dates are spread anywhere from early-April to mid-May. so apparently these are not old pages from when the site was being developed...the problem seems to be continuing.

    superCool has to talk about this tomorrow and would like to not sound like a complete fool. Does anyone understand any of this? Is it valid to have a directory and/or page name after another page name in the url (index.php/directory1/page1.php)? Any idea what could be causing these?

    thanks for any advice,
    superCool

  2. #2
    ABW Ambassador PatrickAllmond's Avatar
    Join Date
    September 20th, 2005
    Location
    OKC
    Posts
    1,219
    That does seem very odd.

    In this case to me index.php looks like a folder name.
    ---
    This response was masterly crafted via the fingers of Patrick Allmond who believe you should StopDoingNothing starting today.
    ---
    Focus Consulting is where I roll | Follow @patrickallmond on Twitter
    Search Engine Marketing | Search Engine Optimization | Social Media | Online Video

  3. #3
    ABW Ambassador Rehan's Avatar
    Join Date
    November 3rd, 2006
    Location
    Toronto
    Posts
    536
    Here's a live example of this, if anyone wants to see it:
    http://wiki.cmsmadesimple.org/index..../For_All_Users

    That site is powered by MediaWiki, which is a wiki that can be used as a content management system. It could be that the sites you saw were using this software or perhaps a different one that is structured similarly.

    The way the URLs are structured, they are using URL rewriting using htaccess or httpd.conf. As it exaplains at http://www.mediawiki.org/wiki/Manual:Short_URL :
    MediaWiki's default page addresses looks like these examples:

    http://example.com/w/index.php/Page_title (recent versions of MediaWiki, unless using CGI)
    http://example.com/w/index.php?title=Page_title (recent versions of MediaWiki, using CGI)

    Using the methods below, short webpage addresses can be changed to addresses such as these:

    http://example.com/wiki/Page_title (this is the standard, same as in Wikipedia)
    http://wiki.example.com/Page_title (not recommended!)
    ...and in the case of some sites, like the one I linked to above, the short address would be like http://example.com/index.php/Page_title

    In terms of SEO, it's obviously better to eliminate the index.php layer if possible. I think older versions of MediaWiki had the index.php in the URL, although it could be eliminated by updating the URL rewriting rules. It looks like newer versions have this fixed by default. So maybe your client has to update their software or at least update the rewriting rules.

    Let me know if I lost you with that "explanation"....
    --

  4. #4
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    Thanks for that - I'll look into it further to see if that could be it. In superCool's case, all of the pages are identical (the unformatted home page). I don't think this was done on purpose. This is a new site and it's not sophisticated at all (lots of very basic problems).

    Here's a better example that looks almost exactly like what superCool is seeing
    http://www.lillydawsoncasting.com/in...lash/about.php

    Here's a couple more -
    http://www.diaprohealthcare.com/inde...lash/about.php
    http://www.frootful.co.uk/index.php/...bout.htm?id=77

    the page does not show images or styling. if you remove everything after .com you will see the site as it should be.

    it seems to be related to flash...??

    thanks

  5. #5
    ABW Ambassador Rehan's Avatar
    Join Date
    November 3rd, 2006
    Location
    Toronto
    Posts
    536
    If you look at the HTML source code for http://www.lillydawsoncasting.com/index.php and http://www.lillydawsoncasting.com/index.php/flash/about.php and http://www.lillydawsoncasting.com/index.php/flash/objective.php, you will see that the HTML is identical for them all. The reason the latter two pages don't look the same as the first one is that all the href and src values in the HTML are using relative paths ("images/example.jpg") rather than absolute paths ("/images/example.jpg"). And as a result, the images and flash and scripts don't load for the latter two pages and it ends up looking like a stripped down version of the site. But it's all unintentional.

    I do think the sites have a URL rewriting rule configured on the server to interpret those funny looking URLs. That's why http://www.lillydawsoncasting.com/index.php/flash/about.php shows some content instead of the expected 404 error page. You could even try http://www.lillydawsoncasting.com/index.php/superCool or http://www.lillydawsoncasting.com/about.php/superCool and it would show the same thing.

    For some general information about URL rewriting, see http://en.wikipedia.org/wiki/Mod_rewrite
    --

  6. #6
    ABW Ambassador ladidah's Avatar
    Join Date
    October 15th, 2007
    Location
    MA
    Posts
    1,888
    Maybe it's pertaining to absolute vs relative urls.

    I got weird error flags on Webmaster Tools for a few urls that also tacked on extra directories onto only a few urls. When I checked it only happened to those that were relative urls and not absolute. I have a few redirects in htaccess so maybe that caused the relative urls not to work. I changed them to absolute and the errors are gone.

    At least if you changed the link to the css file it could help.

    We don't want SuperCool not to look Super Cool tomorrow. Na-ah.

  7. #7
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    yes ghoti you seem to be on to something. superCool goes to the site and keys in index.php/anything and it goes to the same page. so maybe they have this set up for some reason. do you think there is something in the site navigation (maybe relative urls) that is causing google-bot to follow invalid links, but due to the mod-rewrite it finds a page and indexes it?

    this is not superCool's site or code, so not sure if he will be able to dig this out. do you suppose there is a band-aid patch for it?

    so anyway, the problem is that there are 75 bogus URLs indexed by Google with the home page content. superCool would like to get rid of them.

    super thanks to all
    Last edited by superCool; May 31st, 2009 at 11:22 PM.

  8. #8
    ABW Ambassador Georgie Peri's Avatar
    Join Date
    January 18th, 2005
    Location
    Norwalk, CT
    Posts
    846
    I had a problem similar to your issue when i was testing / crawling my website ...


    The problem was because of a URL link that wasnt coded right.


    Check the URLs / Links in the base page and make sure they are all coded correctly for the URL rewrites.
    OpA! Giasou Ti kanies!

  9. #9
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    thanks Magi. are you saying the link on a page was wrong <a href=.....> or was there a problem in the Mod-Rewrite code? If you can share any specifics please PM superCool.

    thanks again to everyone

  10. #10
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    ladidah, they do use relative urls with ./ etc.. do you mean that you changed them to include your whole url (http://www.xxx.com/page) or did you change them to start at the root (/page)? superCool usually uses root-relative links (/page) and has not had problems. are absolute links better?

    superCool does not have access to their Google Webmaster tools yet, but needs to get on there.

    thank you

  11. #11
    ABW Ambassador ladidah's Avatar
    Join Date
    October 15th, 2007
    Location
    MA
    Posts
    1,888
    ghoti is correct. Take a look at their htaccess file first and see what is going on.

    Do you have access to it?

  12. #12
    All Around Web Guy Cursal's Avatar
    Join Date
    January 18th, 2005
    Posts
    829
    If you are saying they are all dupes of the same page(s) and just not showing style I suggest:

    Either of these two
    Better solution
    1) Delete all those extra pages and do 301's to there correlating pages in the htaccess file

    Next best
    2) If you don't have access to the htacess you can have them no-longer indexed by uploading or editing the robots.txt file and for safety sake edit the calling url for the css file using a full url http://www.site.com/style.css. That will help with the websites "image/reputation" until those pages are no longer being indexed.

    I also would like to hear your follow up of what you end up doing, when you do it
    Oregon Publishing: Web Development, Graphic Design, Domains & Marketing
    Deluxe Banners Bartender's Guide Cooking Jobs

  13. #13
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    thanks for all the info. superCool has done some additional research and it looks like it "might" be partially caused by some relative links in the Flash navigation. The Flash is in a different folder than the pages and according to some info superCool found on the web, googlebot might take those links relative to the Flash folder rather than relative to the folder where the page is. still not sure how it finds the home page when it does this?

    anyway, superCool sent what he had to the web designer and hopefully they will be able to figure it out. possibly some Flash problems combined with some redirect and mod-rewrite issues (and who knows what else?).

    superCool will post the solution when he hears of it.

    thanks again for your help peeps.

    your friend, superCool

  14. #14
    The Seal of Aproval rematt's Avatar
    Join Date
    November 19th, 2006
    Location
    The Windy City
    Posts
    4,140
    Don't forget to have the designer or someone else with access to Google's webmaster tools request that the erroneous pages be removed once you find the problem.

    -rematt
    "I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant." - Richard Nixon

  15. #15
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    thanks rematt - that's on the list. remove from Google and then monitor to make sure they don't return.

  16. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Can html index and php index run parallel
    By megatonloh in forum Programming / Datafeeds / Tools
    Replies: 22
    Last Post: April 13th, 2007, 06:55 PM
  2. Making an index page of index pages
    By Pierre (aka Terdef) in forum WebMerge (Fourthworld.com)
    Replies: 0
    Last Post: June 30th, 2005, 05:24 AM
  3. Does Google, others index PHP pages
    By rick_whittington in forum Search Engine Optimization
    Replies: 10
    Last Post: February 10th, 2005, 11:08 PM
  4. index.php and index.html
    By Roland in forum Search Engine Optimization
    Replies: 9
    Last Post: November 21st, 2004, 10:59 PM
  5. Use PHP for multiple Index pages when List is at 20 per page
    By dak142 in forum Programming / Datafeeds / Tools
    Replies: 5
    Last Post: June 29th, 2004, 08:29 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •