Results 1 to 2 of 2
August 5th, 2002, 11:53 AM #1
From PHPWizard.net By: Tobias Ratschiller
<BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>You know all about the advantages dynamically generated web sites offer - but if you want your site to be indexed by search engines, you have to keep in mind how search engines work. This article shows some search engine basics and provides you with guidelines on making your dynamic web sites search-engine-friendly.
By Tobias Ratschiller on September 28th, 2001.
If ecommerce-applications, web-based schedule planners, or personalized portals - dynamic sites are often generated for one user specifically. Web-applications for example often assign a session-ID to unambigously identify a user. A URL would for example look as follows: http://www.foo.com/script.php?ID=b6a...6078abf044cdb5
This makes it possible to recognize users over different separate pages, and possibly also show their shopping cart in an online shop. For a search engine it does not make much sense to show the contents of such a site: usually the session expires after a certain time-span or the content of a site is not traceable anymore.
For this reason many search engines do principally not indicate sites whose address (URL) looks like "dynamics". Part of these are for example addresses which contain "cgi-bin", "pl", "?" or "&". A few search engines just leave the parameters ("?ID") away and call up the page alone ( "script.php").
This perfectly understandable behaviour leads to one problem, though: many bigger sites generate pages in a dynamic way, for example through the use of databases. These should obviously be indicated by search engines. But as already depicted there are problems with URLs like http://www.foo.com/script.php?category=PHP.
The robots of search engines, however, are also normal HTTP-clients and do absolutely not see how a site is created on the server side. And with PHP almost anything can be created that can be sent from a web-server to a client. To make the search-robot indicate a dynamically generated page, it is sufficient to make it believe that the site is page. Instead of the ending "php" for a php-generated site you assign an ending like "html", for example. The URL of your example script now looks as follows: http://www.foo.com/script.html?category=php. If a search engine calls up a page without these parameters, a standard page should come up. This works well with pages that do not need any parameters. Sometimes, though, the parameters do really indicate the content which is connected with certain parameters: An article from the category "PHP" is completely different than an article from the category "Perl": the parameter "category" is thus very important.
Thus the developer has to find another possibility to transfer parameters. The following for example simulates a static html-site: http://www.foo.com/script.html/PHP/. For the robot this looks like a normal index structure: The path component of this URL is /script.html/PHP/. The web-server though executes it as "script.html". The parameter "PHP" is then manually extracted from the path environment ($PATH_INFO). A more elegant way: Apache can directly assign a MIME-type to the file. You simply call the file "script" (without ending) and with Apache's "force-type" directive you assign the type application /x-httpd-php to it.The URL of the script is now: http://www.foo.com/script/PHP/, and the parameter is again visible from the path. All search engines indicate such a page without problems, because they are not different from the static HTML pages anymore.
Making magic with Mod_Rewrite
With Mod_Rewrite it is possible to do without the manual use of the path environment. With Ralf S. Engelschall's Mod_Rewrite URLs can be rewritten on-the-fly; because for these rewrite-rules (thus the instructions according to which the URLs are to be programmed) regular expressions can be used, almost anything imaginable can be done. Further information about this can be found in the documentation under http://www.apache.org/docs/mod/mod_rewrite.html. Please notice that this module is not compiled with Apache in a standard way; you have to give the configure-script the following instructions to also compile mod-rewrite: --activate-module=src/modules/standard/mod-rewrite.o
For our use a few simple rewrite-rules are sufficient. First the rewrite-engine has to be switched on. For this you write the following configuration directives into a .htaccess-file: RewriteEngine on
With the following rule all URLs with the form news<id>.html are transformed in shownews.php?id=<id>. So news01.html becomes shownews.php?id=01:
RewriteRule ^news(.*)\.html$ shownews.php?id=$1
Your script may access the variable $id as usual. The browser of the user does not notice the change - for the browser the file is still called news01.html.
RewriteRule ^(.*)\.html$ shownews.php?id=$1
This line transforms URLs like foo.html into shownews.php?id=foo.
With a few tricks it is possible to make spiders and robots believe to have found static sites which they display in the usual way. The methods presented in the article can be easily integrated in own scripts and with the respective adaptation they also work with other server-side script languages without problems.
Last edited by BurgerBoy; August 23rd, 2013 at 07:11 AM.
August 5th, 2002, 10:59 PM #2
I really do not understand why this is presented as a problem with search engines. This is how life is for e-commerce sites who need to track user behavior and/or shopping carts via session parameters. There is a problem only if you are doing so, and in most cases, affiliate sites are not tracking shopping carts or user behavior via session parameters. (Search engine spiders will encounter problems only if the spider's session expires while the spider is still at work! There are workarounds available to this problem without complicates mod_rewrite work)
All of the major search engines now can crawl dynamic urls and can index them as long as the dynamic urls are linked from any of your spiderable pages. I have had dynamic content for over three years and have not seen any search engine spider having any problem crawling and indexing that content. (Inktomi only recently started crawling -unpaid- dynamic pages)
So name your pages properly.. if you have php pages, it is the safest to name them with .php extension.The overwrite module's intended use was not really to trick SE spiders by creating static-looking aliases to dynamic urls.
I am not sure which "Few search engines" leave the dynamic parts of the urls, but whichever they are, they are not big name SEs for sure and we can live without them.
There is, though a real problem with direct hit as they hex encode the urls and if apache is not configured correctly, user receives a 404.
By Verbalkent in forum Newbie Affiliate FAQs & Helpful ArticlesReplies: 7Last Post: August 16th, 2004, 03:43 PM
By Vinny O'Hare in forum Search Engine OptimizationReplies: 10Last Post: September 8th, 2003, 01:08 AM
By Haiko de Poel, Jr. in forum Search Engine OptimizationReplies: 3Last Post: August 5th, 2002, 03:28 PM
By Haiko de Poel, Jr. in forum Programming / Datafeeds / ToolsReplies: 0Last Post: August 5th, 2002, 11:53 AM
By eaglefire in forum Search Engine OptimizationReplies: 18Last Post: June 16th, 2002, 02:22 AM