Results 1 to 7 of 7
  1. #1
    Full Member jazzylee77's Avatar
    Join Date
    February 19th, 2005
    Posts
    199
    Super Secret Spider Command
    I'm working on installing an onsite search engine and I want to exclude certain page types that this perl script generates. Basically I just want it to index the items pages.

    Can't use Meta robot without excluding other bots
    Can't get specific enough with allowing files within folders or file types using robots.txt

    This leaves the scripts tag that can be inserted before and after the desired excluded content.

    Problem is the tags look like this.

    Code:
    <!--crawlername_noindex--> Content to not be indexed here <!--/crawlername_noindex-->
    If I just chuck that into the dyse templates, the closing tag will appear in the source but not the opening tag. I guess dyse strips it out as a command or something.

    How can I get that tag in there? I thought about virtual includes but that won't send comments and I don't know any way to escape it. Maybe some feature of the script would allow the tag to pass?
    Last edited by jazzylee77; January 17th, 2008 at 08:59 PM.

  2. #2
    Full Member jazzylee77's Avatar
    Join Date
    February 19th, 2005
    Posts
    199
    I wonder if this would work?

    User-agent: freakyspiderthing
    Disallow: /*/
    Disallow: /*/*/
    Disallow: /*/*/*/
    Disallow: /*/*/*/*/
    Allow: /*.html
    Allow: /*/*.html
    Allow: /*/*/*.html
    Allow: /*/*/*/*.html
    Allow: /*/*/*/*/*.html

  3. #3
    Full Member jazzylee77's Avatar
    Join Date
    February 19th, 2005
    Posts
    199
    Quote Originally Posted by jazzylee77
    I wonder if this would work?

    User-agent: freakyspiderthing
    Disallow: /*/
    Disallow: /*/*/
    Disallow: /*/*/*/
    Disallow: /*/*/*/*/
    Allow: /*.html
    Allow: /*/*.html
    Allow: /*/*/*.html
    Allow: /*/*/*/*.html
    Allow: /*/*/*/*/*.html
    answer...didn't work. It's indexing all the folders.

  4. #4
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    The problem is that DySE view.pl thinks that <!--crawlername_noindex--> is an undefined variable so it replaces it with the empty string.

    Try the following work-around. Edit (or create) make-ini.txt and add:

    crawlername_noindex "<!--crawlername_noindex-->"

    Then run make.pl so that make-ini.txt is read in.

    This statement will define crawlername_noindex to have the value <!--crawlername_noindex-->

    With that configuration statement, DySE will replace <!--crawlername_noindex--> with <!--crawlername_noindex--> That is, the same thing. DySE view.pl will repeatedly replace the replacement, but will stop after 10 loop iterations.

    Alternatively, if you can change your software to handle a different comment, change it to something that DySE will not interpret as a configuration variable reference, such as <!--_crawlername_noindex--> since DySE uses the following regular-expression pattern:
    <!--[a-z][a-z0-9_-]*(\.[a-z0-9_-]+)*-->

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  5. #5
    Full Member jazzylee77's Avatar
    Join Date
    February 19th, 2005
    Posts
    199
    Neat! Can I use that same kind of trick to read in php code?

  6. #6
    ABW Ambassador cusimano's Avatar
    Join Date
    January 18th, 2005
    Location
    Toronto, Canada
    Posts
    1,369
    No, you cannot include PHP code. But you can include the output from a PHP URL.

    DySE view.pl is a Perl script thus the server is running the perl interpreter, not the PHP interpreter. The perl interpreter sends the output of view.pl directly to the user; any PHP statements in that output stream are not interpreted but rather sent to the user as is.

    DySE view.pl does support SSI statements. Thus you can include the output of any URL, including of a PHP URL, into the output stream. For example, you could have the following SSI statement in your template file:

    <!--#include virtual="/inc/counter.php"-->

    DySE view.pl will issue an HTTP request to your server and read in the output of that URL (/inc/counter.php) and insert it into the output stream sent to the user. (Actual, it is inserted into the output buffer and the output buffer is scanned again for any further substitutions; once no more substitutions are possible the output buffer has stabilized, then the output buffer is sent to the user. Thus nested includes and nested configuration variable definitions are possible).

    Yours truly,
    Cusimano.Com Corporation
    per: David Cusimano

  7. #7
    Full Member jazzylee77's Avatar
    Join Date
    February 19th, 2005
    Posts
    199
    I think I knew that (hoping for a loophole). Thanks for the clear explanation.

  8. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. pepperjam's secret to becoming a super affiliate is...
    By Rehan in forum eBay Enterprise Affiliate Network
    Replies: 54
    Last Post: July 5th, 2008, 07:09 PM
  2. Replies: 21
    Last Post: May 13th, 2008, 03:28 PM
  3. Secret to finding those super affiliates?
    By winnie07 in forum Midnight Cafe'
    Replies: 6
    Last Post: February 8th, 2007, 02:54 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •