Results 1 to 6 of 6
  1. #1
    Full Member
    Join Date
    January 18th, 2005
    Posts
    222
    Hi. I think I know how to make a robots.txt file from reading some tutorials. But I did have a question. If I want to exclude slurp because I have pages in inktomi that are similar and I make a robots.txt to indicate that do I also have to write something on my webpage code itself about robots? Or does the spider first go for the robots.txt file and uses that to interpret things?
    I am just wondering if I need to put any coding in my webpage itself. Thanks.

  2. #2
    Newbie
    Join Date
    January 18th, 2005
    Posts
    2,694
    No. If you've got a robots.txt you don't need to put anything in the html page. That's the beauty of it.

    Just make sure your robots.txt syntax is correct so you don't accidently ban something you want. I don't remember the addresses of any but there are webpages out there where you can put in the URL of your robots.txt file and it will check the syntax (try a search for robots.txt syntax checker).

    {

    "Laziness, Impatience, Hubris. Pick any three" ~ YAPC 19100

  3. #3
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    There's a robots.txt syntax checker at www.searchengineworld.com.
    Good luck in trying to exclude pages for slurp - there's a lot of slurp spiders out there. Here's the main ones:

    Mozilla/3.0 (Slurp.so/Goo; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
    Mozilla/3.0 (Slurp/cat; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
    Mozilla/3.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
    Mozilla/5.0 (Slurp/cat; slurp@inktomi.com; http://www.inktomi.com/slurp.html
    Mozilla/5.0 (Slurp/si; slurp@inktomi.com; http://www.inktomi.com/slurp.html)

    The first one is the main free spider so I guess this is the one you want to exclude. There are other variations on this, but I don't have the info to hand. But you would need to be excluding all Slurp/so spiders.

    The Slurp/si crawler is the one that checks that your paid inclusion pages are not doorways and Slurp/cat is the paid inclusion spider.

    Search Engine Positioning - 1 Design 4 Life

  4. #4
    Newbie
    Join Date
    January 18th, 2005
    Posts
    2,694
    I dont' know much about robots.txt syntax, but would it be possible to use wildcards for that
    like this?

    Mozilla/3.0 (Slurp*/*; slurp@inktomi.com; http://www.inktomi.com/slurp.html)
    Mozilla/5.0 (Slurp*/*; slurp@inktomi.com; http://www.inktomi.com/slurp.html)


    {

    "Laziness, Impatience, Hubris. Pick any three" ~ YAPC 19100

  5. #5
    ABW Ambassador DesignerWiz's Avatar
    Join Date
    January 18th, 2005
    Location
    U.S.A
    Posts
    2,777
    Our service team is also pleased to offer a robots.txt checker at http://www.designerwiz.com/test/robo..._validator.htm

    DesignerWiz
    http://DesignerWiz.com

  6. #6
    ABW Ambassador FFoc's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,015
    Here's a resource, gleaned from googlebot's homepage:
    http://www.robotstxt.org/wc/robots.html

    And, as to whether a * will suffice to exclude the bots, it depends on how standards-compliant the bot is.

  7. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Robots Txt Plugin, Do You Use One?
    By Trust in forum Blogging, Mobile and Social Media
    Replies: 6
    Last Post: August 31st, 2010, 06:46 PM
  2. Robots.txt for WordPress?
    By Uncle Rico in forum Blogging, Mobile and Social Media
    Replies: 6
    Last Post: October 30th, 2009, 08:43 AM
  3. Restricted by robots.txt without robots.txt?
    By mayfly in forum Search Engine Optimization
    Replies: 10
    Last Post: August 26th, 2009, 05:13 PM
  4. Robots.txt
    By Rhia7 in forum Midnight Cafe'
    Replies: 0
    Last Post: April 18th, 2009, 12:34 AM
  5. Do you use a robots.txt?
    By Mr. Sal in forum Voting Booth
    Replies: 11
    Last Post: November 12th, 2003, 07:29 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •