Results 1 to 14 of 14
  1. #1
    Full Member
    Join Date
    January 18th, 2005
    Location
    Des Moines, IA
    Posts
    298
    It just came to me... Why not put a search box on my custom error page? Well, it seems like a good idea. (at 1:30am) Any reason I shouldn't?

    Ray

    Finally figured out why people spend $2.00 apiece on those
    little bottles of Evian water. Just spell it backwards for the answer.

  2. #2
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    ÄúsTrálíĺ
    Posts
    1,372
    Good idea. That's all.
    You could even go one further & parse the referring url for search terms, and automatically do the search for them.
    I'm keeping that code to myself though. [img]/infopop/emoticons/icon_biggrin.gif[/img]

  3. #3
    ABW Ambassador CrazyGuy's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,463
    Yeah - I'm discovering you can do lotsa things with 404 pages.

    I just built a whole site that only has a 404 page - waiting to see if it gets indexed. [img]/infopop/emoticons/icon_biggrin.gif[/img]

    Are you Crazy?

  4. #4
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    ÄúsTrálíĺ
    Posts
    1,372
    I did this also with approx 18,000 pages.

    As long as you return 'Header: 200 OK' (or whatever it is), you will be alright. (otherwise you just got heaps of pages spidered that returned 404.. not good)

  5. #5
    ABW Ambassador CrazyGuy's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,463
    Pete - I tied myself in semantic knots with this, I'd appreciate your take/experience.

    The 404 error is being directed server-side to a page which in turn SSIs the script that does the clever stuff.

    My final thinking was that the error is trapped at the server (via .htaccess) and a real page is successfully delivered, so the header would be "200 OK".

    I was looking for a way to check this when the googlebots swarmed all over the site so it became academic for the moment. Do you think they'd be getting 404s? Surely they'd have given up and not continued to revisit for thousands more pages?

    I *think* I saw them follow links within the pages - or I may be deluding myself.

    Too many other things to do. Might wait for next googledance and see if the pages appear. If not, I'll revisit my technique.

    Are you Crazy?

  6. #6
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    ÄúsTrálíĺ
    Posts
    1,372
    note: this is an old method I used.. I now use other methods which don't rely on 404's etc.

    I know the method you are using, and I'm pretty sure that googlebot will see your pages as 404's (not found).

    The method I used is basically this.

    I have a 404 page called script.php

    there are basically no other files on the server except for this file (& images etc).

    So, on the main page, I can link to mysite.com/manufacturer/1.html (or index.html, which shows a list of all manufacturers).

    There is of course, no such file, or even directory called manufacturer.
    A 404 code is generated because the file is not found, and the page I've specified in my .htaccess (script.php) is the page which is loaded.
    I then parse the current url (the url, not the scipt, so I have in variables 'manufacturer', and '1.html'. I know that I need to show the details for manufacturer #1. (if there is no 1.html or index.html, I show a list of all manufacturers)
    The important thing to note, is that the visitor (or spider) DOES receive a 404 not found at this point. (you are just outputting data for a 404 page.. lots of sites do this.. "woops, we can't find page blah.html etc").
    You MUST output (php) - header("HTTP/1.1 200 OK"); (before any other output).. There are similar commands in perl, but I can't recall which. (print "Header: HTTP/1.1 200 OK\n\n"???)
    This overwrites the 404 NOT FOUND, and tells the browser/spider that domain.com/manufacturer/1.html is in fact a valid page.

    This is long winded (it's late), and like I said, I don't use this anymore because of every page generates an entry in the error logs, and because of possible server overhead.

    [edit to add]
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>
    My final thinking was that the error is trapped at the server (via .htaccess) and a real page is successfully delivered, so the header would be "200 OK". <HR></BLOCKQUOTE>
    Here's the short version [img]/infopop/emoticons/icon_smile.gif[/img]
    Just because a page is delivered, doesn't mean the status for the page (in the header) changes from 404 to 200.

  7. #7
    ABW Ambassador CrazyGuy's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,463
    Thanks Pete

    That's pretty much what I'm doing (in terms of pointing to pages that don't exist then serving stuff up according to what was asked for).

    Problem was - the name of the 404 page was pre-ordained on this site, which is why I called my script from within it rather than running it directly - which would have been obvious and preferable. Printing the correct headers is straightforward enough - but I didn't think there would be any point from a script within an error page.

    I'll have more flexibility on my dedicated server - though I'm conscious of the error log/server load issues. Something that started as an interesting experiment got out of control. Those googlebots did gobble it up, so I may prioritise setting this up on the server to see if I can get anything from it.

    Are you Crazy?

  8. #8
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    ÄúsTrálíĺ
    Posts
    1,372
    OK.. didn't see the SSI part.
    I'm imagining that by the time the SSI parts are being processed, there has already been output, and it's then too late to output the headers. hmmm

  9. #9
    ABW Ambassador CrazyGuy's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,463
    We'll see - I'll set it up on the ded where I can .htaccess the 404s to a script and output full headers.

    I've got lots of pages pointing to it, so worst case the googlebots might come back next month and find the pages properly.

    If they appear in the index at the start of Dec, they were probably ok anyway (but I suspect they're showing as 404s).

    Maybe I should stop trying to be too clever and just go make some pages [img]/infopop/emoticons/icon_eek.gif[/img]

    Are you Crazy?

  10. #10
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    ÄúsTrálíĺ
    Posts
    1,372
    I knew there was a tool somewhere..

    WebMasterWorld has a tool where you can check the headers returned from a page.
    I think you might need to be a member to use it though (just sign up)

  11. #11
    ABW Ambassador CrazyGuy's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,463
    D'oh [img]/infopop/emoticons/icon_mad.gif[/img]

    That tool confirmed the current setup returns 404s - but the embedded script (now) returns 200s.

    I'll transfer this to the ded tomorrow first thing and see if the googlebots will give me another chance.

    Thanks Pete - at least I can fix it now [img]/infopop/emoticons/icon_biggrin.gif[/img]

    Are you Crazy?

  12. #12
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    ÄúsTrálíĺ
    Posts
    1,372
    Calling the script on it's own will generate a 200 OK header.

    But calling it from a 404'd page will return a 404 NOT FOUND unless you overwrite that with Header: 200 OK (or whatever I posted above).
    As long as the headers aren't already sent (ie, you can output content-type etc), you should be able to send a header 200 OK & your 404 page will then return a 200. [img]/infopop/emoticons/icon_smile.gif[/img]

  13. #13
    ABW Ambassador CrazyGuy's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,463
    Sure - I understand. On the dedicated server I have the control to say that I want the script dished up in response to 404s (instead of being forced to use 404.shtml as I am on the webhost) and first thing it will do is provide "HTTP/1.1 200 OK"

    Are you Crazy?

  14. #14
    ABW Ambassador CrazyGuy's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,463
    Closing the loop on this for anyone who finds it in a search sometime in the future ...

    For complete control over http headers in perl, you have to call your script nph-xxxxxxxxx.pl. This tells the server Not to Parse the Headers (nph geddit?).

    So, if your errordoc in .htaccess is a perl script, it will still produce a 404 error header unless you use the nph route and make your first few lines something like:

    print "HTTP/1.1 200 OK\n";
    print "Server: $ENV{'SERVER_SOFTWARE'}\n";
    print "Content-type: text/html\n\n";

    Be warned - there are risks in using nph scripts. Like they don't stop unless they're told to, so you have to make sure there's no danger of a script looping indefinitely.

    exit (0);

    is a good final statement. Searches will bring up many useful references now you know what your're looking for [img]/infopop/emoticons/icon_biggrin.gif[/img]

    Are you Crazy?

  15. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. MSN Search Box - Search Your Own Site Box
    By Trust in forum Search Engine Optimization
    Replies: 9
    Last Post: April 16th, 2005, 07:47 PM
  2. 404 page
    By SSanf in forum Programming / Datafeeds / Tools
    Replies: 4
    Last Post: March 1st, 2003, 10:04 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •