Results 1 to 13 of 13
  1. #1
    ABW Ambassador Sam Bay's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,603
    Some of the pages in my dynamic site are not crawled by Google while some of them are. After looking into it, I've found out that, if there is a "space" between the words in the URL, such as Gifts & Flowers, GoogleBot is not able to crawl, at least in my case.

    For example:
    domainname.com/categories.ASP?MainCategory=Collectibles&SubCategory=Coins
    is crawled,

    domainname.com/categories.ASP?MainCategory=Gifts+%26+Flowers&SubCategory=Flowers
    is NOT crawled.

    Is there anything I can do and change this so Google can crawl it?

    Thanks...

  2. #2
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    Take the space out of the URL ?

    Search Engine Positioning - 1 Design 4 Life

  3. #3
    Newbie
    Join Date
    January 18th, 2005
    Posts
    24
    One of the reasons is that a space is NOT a valid html syntax naming convention. IE auto inserts the %20 as the hex equivalent of a space so that there is no space in the URL.

    A space is generally considered an invalid character in filenames.

    As a sidenote, some browsers will not be able to read pages that have spaces in the URL since it is not valid standard HTML protocol.

    Additionally, anything after an & is usually interpreted as a qualifier in code. For example,

    domain.com/index.php?category=Flowers_and_Gifts&subcategory=Flowers&nested=Roses

    Many spiders seem stop as soon as they encounter the first & and some even seem to stop as soon as they encounter the ?

  4. #4
    ABW Ambassador Sam Bay's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,603
    Mark,
    How? The URL is dynamically generated by the code. And in the database there is spaces between the words, and it should be.

    Thanks, very much Robert,

    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by Robert:
    As a sidenote, some browsers will not be able to read pages that have spaces in the URL since it is not valid standard HTML protocol.
    <HR></BLOCKQUOTE>

    &gt;&gt; Yes, we made a change in the URL code, because Netscape browsers were not able to read it.

    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>
    Additionally, anything after an & is usually interpreted as a qualifier in code. For example,

    domain.com/index.php?category=Flowers_and_Gifts&subcategory=Flowers&nested=Roses

    Many spiders seem stop as soon as they encounter the first & and some even seem to stop as soon as they encounter the ?
    <HR></BLOCKQUOTE>

    &gt;&gt; GooggleBot does not stop with & sign. As I pointed out, the space between the words are causing them stop.

    Is there anyway, I can replace spaces with something GoogleBot can read.

    [/QUOTE]

    And Google's response:
    The Google index does include pages that have question marks in their URLs, including dynamically generated pages. However, these pages comprise a very small portion of our index. Pages that contain question marks in their URLs can cause problems for our crawler and may be ignored. If you suspect that URLs containing question marks are causing your pages not to be included, you may want to consider creating copies of those pages without question marks in the URL for our crawler. If you do this, please be sure to include robots.txt on the pages with the question marks in the URL that block our crawler to ensure that these pages are not seen as having duplicate content.

    SamBay

  5. #5
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    ÄúsTrálíĺ
    Posts
    1,372
    SO make your script replace all spaces when it outputs the url with a _.

    When you read the url variable, simply replace all _'s back to space.

    Easy Peasy

  6. #6
    Full Member
    Join Date
    January 18th, 2005
    Posts
    339
    SamBay,

    Instead of using the "+" in the parameters, use the correct escape character of %20 for a space.

    Jim in Texas [img]/infopop/emoticons/icon_smile.gif[/img]

    Patriot, Army Type, One Each...
    USA - This We'll Defend

  7. #7
    ABW Ambassador Sam Bay's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,603
    Thanks Pete, Jimbet.

  8. #8
    ABW Ambassador iucpxleps's Avatar
    Join Date
    January 18th, 2005
    Posts
    648
    I have a similar problem..I DO have pages in SEs with spaces but they are in the cache as bla.php?bla=blabla%20blabla so when an user finds my page and clicks it's link it results in a nice 404 error hehe.. [img]/infopop/emoticons/icon_frown.gif[/img] [img]/infopop/emoticons/icon_rolleyes.gif[/img] btw that is not google..

    He who steals a minaret prepares a proper
    cover beforehand, said of someone who intends to do something illegal.

  9. #9
    Newbie
    Join Date
    January 18th, 2005
    Posts
    16
    Why feed Google (or any other spider) a URL that looks like

    <pre class="ip-ubbcode-code-pre">
    http://somedomain.com/index.php?category=flowers </pre>

    when you give give them

    <pre class="ip-ubbcode-code-pre">
    http://somedomain.com/category/flowers/
    </pre>

    and have mod_rewrite 'rewrite' the request back to
    <pre class="ip-ubbcode-code-pre">
    http://somedomain.com/index.php?category=flowers </pre>

    Now your dynamic url looks static. Unfortunately I'm not sure how one could accomplish the same thing using something other than Apache.

  10. #10
    ABW Ambassador Sam Bay's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,603
    Kpuc,

    Yeah, I have to use Microsoft servers because the site is in ASP.

    Actually, it would be great to convert dynamic pages into HTML, like Amazon and Bizrate do:

    apparel.bizrate.com/ ,mcc__cat_id--24000000,rf--wgg.html

    Anyone know how to do this?

  11. #11
    Pimp Duck popdawg's Avatar
    Join Date
    January 18th, 2005
    Location
    Take off eh?
    Posts
    3,249
    If you find out, please let me know too.
    I have been looking for different ways of doing this for asp as well.
    So far it looks like the only way is with a custom 404 error page which I don't really like the idea of doing.

    Game on!!!!

  12. #12
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    It's pretty easy to do for asp. You need an ISAPI filter that rewrites the URLs in the same way mod_rewrite does for Apache servers.

    Luckily, there are several good products on the market for ASP, so why reinvent the wheel.

    www.qwerksoft.com has a pretty good one at under $100 that I've used in the past but there's plenty out there, so do a search on isapi filter or isapi rewrite filter and you'll find plenty.

    Search Engine Positioning - 1 Design 4 Life

  13. #13
    Pimp Duck popdawg's Avatar
    Join Date
    January 18th, 2005
    Location
    Take off eh?
    Posts
    3,249
    Thanks Mark!!

    Game on!!!!

  14. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Dynamic product page deletion
    By oranges in forum Midnight Cafe'
    Replies: 2
    Last Post: September 3rd, 2014, 04:55 PM
  2. page 2+ dynamic title for datafeed
    By ladidah in forum AvantLink -AV
    Replies: 9
    Last Post: April 21st, 2009, 03:37 PM
  3. amazon.pl output into dynamic page
    By CrazyGuy in forum Cusimano.com Scripts
    Replies: 3
    Last Post: June 25th, 2003, 08:39 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •