From By: Tobias Ratschiller

<BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>You know all about the advantages dynamically generated web sites offer - but if you want your site to be indexed by search engines, you have to keep in mind how search engines work. This article shows some search engine basics and provides you with guidelines on making your dynamic web sites search-engine-friendly.

By Tobias Ratschiller on September 28th, 2001.

The problem
If ecommerce-applications, web-based schedule planners, or personalized portals - dynamic sites are often generated for one user specifically. Web-applications for example often assign a session-ID to unambigously identify a user. A URL would for example look as follows:
This makes it possible to recognize users over different separate pages, and possibly also show their shopping cart in an online shop. For a search engine it does not make much sense to show the contents of such a site: usually the session expires after a certain time-span or the content of a site is not traceable anymore.

For this reason many search engines do principally not indicate sites whose address (URL) looks like "dynamics". Part of these are for example addresses which contain "cgi-bin", "pl", "?" or "&". A few search engines just leave the parameters ("?ID") away and call up the page alone ( "script.php").

This perfectly understandable behaviour leads to one problem, though: many bigger sites generate pages in a dynamic way, for example through the use of databases. These should obviously be indicated by search engines. But as already depicted there are problems with URLs like

Fooling robots
The robots of search engines, however, are also normal HTTP-clients and do absolutely not see how a site is created on the server side. And with PHP almost anything can be created that can be sent from a web-server to a client. To make the search-robot indicate a dynamically generated page, it is sufficient to make it believe that the site is page. Instead of the ending "php" for a php-generated site you assign an ending like "html", for example. The URL of your example script now looks as follows: If a search engine calls up a page without these parameters, a standard page should come up. This works well with pages that do not need any parameters. Sometimes, though, the parameters do really indicate the content which is connected with certain parameters: An article from the category "PHP" is completely different than an article from the category "Perl": the parameter "category" is thus very important.

Thus the developer has to find another possibility to transfer parameters. The following for example simulates a static html-site: For the robot this looks like a normal index structure: The path component of this URL is /script.html/PHP/. The web-server though executes it as "script.html". The parameter "PHP" is then manually extracted from the path environment ($PATH_INFO). A more elegant way: Apache can directly assign a MIME-type to the file. You simply call the file "script" (without ending) and with Apache's "force-type" directive you assign the type application /x-httpd-php to it.The URL of the script is now:, and the parameter is again visible from the path. All search engines indicate such a page without problems, because they are not different from the static HTML pages anymore.

Making magic with Mod_Rewrite
With Mod_Rewrite it is possible to do without the manual use of the path environment. With Ralf S. Engelschall's Mod_Rewrite URLs can be rewritten on-the-fly; because for these rewrite-rules (thus the instructions according to which the URLs are to be programmed) regular expressions can be used, almost anything imaginable can be done. Further information about this can be found in the documentation under Please notice that this module is not compiled with Apache in a standard way; you have to give the configure-script the following instructions to also compile mod-rewrite: --activate-module=src/modules/standard/mod-rewrite.o

For our use a few simple rewrite-rules are sufficient. First the rewrite-engine has to be switched on. For this you write the following configuration directives into a .htaccess-file: RewriteEngine on

With the following rule all URLs with the form news<id>.html are transformed in shownews.php?id=<id>. So news01.html becomes shownews.php?id=01:

RewriteRule ^news(.*)\.html$ shownews.php?id=$1

Your script may access the variable $id as usual. The browser of the user does not notice the change - for the browser the file is still called news01.html.

Another example:

RewriteRule ^(.*)\.html$ shownews.php?id=$1

This line transforms URLs like foo.html into shownews.php?id=foo.

With a few tricks it is possible to make spiders and robots believe to have found static sites which they display in the usual way. The methods presented in the article can be easily integrated in own scripts and with the respective adaptation they also work with other server-side script languages without problems.