Results 1 to 21 of 21
  1. #1
    Newbie
    Join Date
    January 18th, 2005
    Posts
    37
    Some of the sites I have been reading mention a "robots.txt" file? Do I need to create one of these? If so, how?

    Thanks

  2. #2
    Affiliate Manager
    Join Date
    January 18th, 2005
    Posts
    1,056
    If you want to ban certain bots from accessing your site. I was getting alot of 404 errors in my logs from the SE's looking for it so I just created a blank file and named it robots.txt and then uploaded it to my server.

  3. #3
    Newbie
    Join Date
    January 18th, 2005
    Posts
    37
    Thanks!

  4. #4
    ABW Ambassador Nature Boy's Avatar
    Join Date
    January 18th, 2005
    Location
    Tennessee
    Posts
    1,423
    It comes in handy for all those rogue spider/bots out there. And if that doesn't keep them out, use .htaccess to ban their IP addy
    Scott
    If you can't dazzle them with brilliance, then baffle them with bulls#!t
    Don't tell me that you'll do it... SHOW ME.
    Just because everyone else is drinking it is no reason for me to drink the KOOL-AID.

  5. #5
    Affiliate Miester my2cents's Avatar
    Join Date
    January 18th, 2005
    Location
    far far away....
    Posts
    2,161
    HI UJ,

    go to WebMaster Central

    or Webmaster Tool Kit

    both of these website offer robot.txt generation tools as well as several other useful page building tools.

    Joe
    ++++++++++++++++++++++++++++++++++++++++++
    that's my2cents, 'cuz I'm a legend in my own mind....

  6. #6
    ABW Ambassador buy_online's Avatar
    Join Date
    January 18th, 2005
    Location
    Richmond, VA
    Posts
    3,234
    First, what Joe said, and try this link:
    http://www.robotstxt.org/wc/robots.html

    Not only will it give you more control, but many feel that it is good to have the spiders/robots find the file.

    Fred

  7. #7
    ABW Ambassador Packy's Avatar
    Join Date
    January 18th, 2005
    Location
    Syracuse
    Posts
    4,205
    This is something that I haven't a clue on but keep wondering about. I use Frontpage for my sites. If I use the meta tags

    Name = robots
    Vallue = index,follow

    Will that be the same as a robots.txt file or not? I seem to be getting alot of 404's on all my sites. If that doesn't do the trick, could I just make up a new page in Frontpage, name it robots.txt and add the below text to the page and leave it at that?

    # All robots will spider the domain
    User-agent: *
    Disallow:

    Thanks for any help and suggestions all
    The Answer to the New York Tax Law - Repeal, REPEAL, REPEAL -

    Camping Gear and Equipment

  8. #8
    2005 Linkshare Golden Link Award Winner  ecomcity's Avatar
    Join Date
    January 18th, 2005
    Location
    St Clair Shores MI.
    Posts
    17,328
    cut and paste this one packster

    User-agent:*
    User-agent: Mediapartners-Google*
    Disallow:
    Disallow:/stats/
    Disallow:/_private/
    Disallow:/_borders/
    Disallow:/_fpclass/
    Disallow:/_overlay/
    Disallow:/_themes/
    Disallow:/_vti_bin/
    Disallow:/_vti_cnf/
    Disallow:/_vti_log/
    Disallow:/_vti_pvt/
    Disallow:/_vti_txt/
    Disallow:/images/
    Disallow:/club/
    User-agent: TurnitinBot
    Disallow: /
    User-agent: grub-client
    Disallow: /

    User-agent: grub
    Disallow: /

    User-agent: looksmart
    Disallow: /

    User-agent: WebZip
    Disallow: /

    User-agent: larbin
    Disallow: /

    User-agent: b2w/0.1
    Disallow: /

    User-agent: psbot
    Disallow: /

    User-agent: Python-urllib
    Disallow: /


    User-agent: URL_Spider_Pro
    Disallow: /

    User-agent: CherryPicker
    Disallow: /

    User-agent: EmailCollector
    Disallow: /

    User-agent: EmailSiphon
    Disallow: /

    User-agent: WebBandit
    Disallow: /

    User-agent: EmailWolf
    Disallow: /

    User-agent: ExtractorPro
    Disallow: /

    User-agent: CopyRightCheck
    Disallow: /

    User-agent: Crescent
    Disallow: /

    User-agent: SiteSnagger
    Disallow: /

    User-agent: ProWebWalker
    Disallow: /

    User-agent: CheeseBot
    Disallow: /

    User-agent: LNSpiderguy
    Disallow: /

    User-agent: ia_archiver
    Disallow: /

    User-agent: ia_archiver/1.6
    Disallow: /

    User-agent: Alexibot
    Disallow: /

    User-agent: Teleport
    Disallow: /

    User-agent: TeleportPro
    Disallow: /

    User-agent: MIIxpc
    Disallow: /

    User-agent: Telesoft
    Disallow: /

    User-agent: Website Quester
    Disallow: /

    User-agent: moget/2.1
    Disallow: /

    User-agent: WebZip/4.0
    Disallow: /

    User-agent: WebStripper
    Disallow: /

    User-agent: WebSauger
    Disallow: /

    User-agent: WebCopier
    Disallow: /

    User-agent: NetAnts
    Disallow: /

    User-agent: Mister PiX
    Disallow: /

    User-agent: WebAuto
    Disallow: /

    User-agent: TheNomad
    Disallow: /

    User-agent: WWW-Collector-E
    Disallow: /

    User-agent: RMA
    Disallow: /

    User-agent: libWeb/clsHTTP
    Disallow: /

    User-agent: asterias
    Disallow: /

    User-agent: httplib
    Disallow: /

    User-agent: turingos
    Disallow: /

    User-agent: spanner
    Disallow: /

    User-agent: InfoNaviRobot
    Disallow: /

    User-agent: Harvest/1.5
    Disallow: /

    User-agent: Bullseye/1.0
    Disallow: /

    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow: /

    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /

    User-agent: CherryPickerSE/1.0
    Disallow: /

    User-agent: CherryPickerElite/1.0
    Disallow: /

    User-agent: WebBandit/3.50
    Disallow: /

    User-agent: NICErsPRO
    Disallow: /

    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /

    User-agent: DittoSpyder
    Disallow: /

    User-agent: Foobot
    Disallow: /

    User-agent: SpankBot
    Disallow: /

    User-agent: BotALot
    Disallow: /

    User-agent: lwp-trivial/1.34
    Disallow: /

    User-agent: lwp-trivial
    Disallow: /

    User-agent: BunnySlippers
    Disallow: /

    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /

    User-agent: URLy Warning
    Disallow: /

    User-agent: Wget/1.6
    Disallow: /

    User-agent: Wget/1.5.3
    Disallow: /

    User-agent: Wget
    Disallow: /

    User-agent: LinkWalker
    Disallow: /

    User-agent: cosmos
    Disallow: /

    User-agent: moget
    Disallow: /

    User-agent: hloader
    Disallow: /

    User-agent: humanlinks
    Disallow: /

    User-agent: LinkextractorPro
    Disallow: /

    User-agent: Offline Explorer
    Disallow: /

    User-agent: Mata Hari
    Disallow: /

    User-agent: LexiBot
    Disallow: /

    User-agent: Web Image Collector
    Disallow: /

    User-agent: The Intraformant
    Disallow: /

    User-agent: True_Robot/1.0
    Disallow: /

    User-agent: True_Robot
    Disallow: /

    User-agent: BlowFish/1.0
    Disallow: /

    User-agent: JennyBot
    Disallow: /

    User-agent: MIIxpc/4.2
    Disallow: /

    User-agent: BuiltBotTough
    Disallow: /

    User-agent: ProPowerBot/2.14
    Disallow: /

    User-agent: BackDoorBot/1.0
    Disallow: /

    User-agent: toCrawl/UrlDispatcher
    Disallow: /

    User-agent: WebEnhancer
    Disallow: /

    User-agent: suzuran
    Disallow: /

    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /

    User-agent: VCI
    Disallow: /

    User-agent: Szukacz/1.4
    Disallow: /

    User-agent: QueryN Metasearch
    Disallow: /

    User-agent: Openfind data gathere
    Disallow: /

    User-agent: Openfind
    Disallow: /

    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /

    User-agent: Xenu's
    Disallow: /

    User-agent: Zeus
    Disallow: /

    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /

    User-agent: RepoMonkey
    Disallow: /

    User-agent: Microsoft URL Control
    Disallow: /

    User-agent: Openbot
    Disallow: /

    User-agent: URL Control
    Disallow: /

    User-agent: Zeus Link Scout
    Disallow: /

    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /

    User-agent: Webster Pro
    Disallow: /

    User-agent: EroCrawler
    Disallow: /

    User-agent: LinkScan/8.1a Unix
    Disallow: /

    User-agent: Keyword Density/0.9
    Disallow: /

    User-agent: Kenjin Spider
    Disallow: /

    User-agent: Iron33/1.0.2
    Disallow: /

    User-agent: Bookmark search tool
    Disallow: /

    User-agent: GetRight/4.2
    Disallow: /

    User-agent: FairAd Client
    Disallow: /

    User-agent: Gaisbot
    Disallow: /

    User-agent: Aqua_Products
    Disallow: /

    User-agent: Radiation Retriever 1.1
    Disallow: /

    User-agent: WebmasterWorld Extractor
    Disallow: /

    User-agent: Flaming AttackBot
    Disallow: /

    User-agent: Oracle Ultra Search
    Disallow: /

    User-agent: PerMan
    Disallow: /

    User-agent: searchpreview
    Disallow: /
    Webmaster's... Mike and Charlie

    "What have you done today to put real value into a referral click...from a shoppers viewpoint!"

  9. #9
    Full Member
    Join Date
    January 18th, 2005
    Posts
    276
    Mike: This thread caught my eye as well - The above "code" you list - Would I simply cut & paste this in, say - Notepad and save the file as Robots.txt? Then upload to the root directory on my server?

    John

  10. #10
    Newbie
    Join Date
    January 18th, 2005
    Posts
    17
    Probably a stupid question, but will having a robots.txt file help you get listed in the engines?

  11. #11
    2005 Linkshare Golden Link Award Winner  ecomcity's Avatar
    Join Date
    January 18th, 2005
    Location
    St Clair Shores MI.
    Posts
    17,328
    Yes to both of your questions...
    Webmaster's... Mike and Charlie

    "What have you done today to put real value into a referral click...from a shoppers viewpoint!"

  12. #12
    ABW Ambassador ShoreMark's Avatar
    Join Date
    January 18th, 2005
    Location
    NJ, USA
    Posts
    912
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by EcomCity.com:
    Yes to both of your questions... <HR></BLOCKQUOTE>

    I've got a few of those listed already, Mike. Two questions: I have the .com appendix in them, no need? and: does such a long list cause any other problems with access? All of those you list look like good candidates to add

  13. #13
    All Around Web Guy Cursal's Avatar
    Join Date
    January 18th, 2005
    Posts
    829
    Mike Are those listed Allowed or disallowed?

    I don''t see the the allow statement only Disallow
    That is why I am asking. Are those Spiders and robots you are keeping out ?

    TIA

    --Brian
    Oregon Publishing: Web Development, Graphic Design, Domains & Marketing
    Deluxe Banners Bartender's Guide Cooking Jobs

  14. #14
    ABW Ambassador Andy's Avatar
    Join Date
    January 18th, 2005
    Posts
    4,178
    Brian,

    The list Mike posted above disallows those bots from access.

    An important point to remember: the robots.txt file is only good with the ethical bots that actually check it, and follow its commands. Bad bots will hit the robots.txt sometimes, and spider disallowed directories and files anyway. The only way to stop them from doing this is to ban them with .htaccess.

    Sometimes the bad bots learn where prohibited directories are located by spidering the robots.txt file, so make sure you don't give away any secrets.

    Andy

  15. #15
    All Around Web Guy Cursal's Avatar
    Join Date
    January 18th, 2005
    Posts
    829
    Thanks Andy,

    I figured they were Disallowed, but it has been a while since I did anywork on my robots.txt file and forgot some of what the coding ment.

    Also thanks for the tips on bad bots.

    --Brian
    Oregon Publishing: Web Development, Graphic Design, Domains & Marketing
    Deluxe Banners Bartender's Guide Cooking Jobs

  16. #16
    ABW Ambassador Packy's Avatar
    Join Date
    January 18th, 2005
    Location
    Syracuse
    Posts
    4,205
    Mike, so if I just copy that into a Frontpage page, I would just name the file robots.txt . Also does that still allow Google, MSN and the other Bots to spider? Thanks
    The Answer to the New York Tax Law - Repeal, REPEAL, REPEAL -

    Camping Gear and Equipment

  17. #17
    ABW Ambassador Packy's Avatar
    Join Date
    January 18th, 2005
    Location
    Syracuse
    Posts
    4,205
    Bumpity Bump Still curious as to whether all the other robots like msn and Yahoo will be allowed if I use Mikes coding for robot.txt?

    Also, the page would just be named robots.txt, true and would I just load it onto the server. I wouldn't need a link to it on any page for the spiders to find it, in other words how do the robots find the .txt file. Yes I am clueless Thanks All
    The Answer to the New York Tax Law - Repeal, REPEAL, REPEAL -

    Camping Gear and Equipment

  18. #18
    Full Member
    Join Date
    January 18th, 2005
    Posts
    235
    Pack,

    I am clueless also, Just stumbled across this post and have some questions also. I copied and posted it to my server but now I can't see it. I named it robots.txt but did not create a new page. I know it's there guess I'll wait and see if it works.

    Inktomi slupred my site this morning and I had never submitted it anywhere. But I found several robots.txt not found errors on my server logs.

    Any more insight would be great I'm sort of wandering in the dark.


    Thanks for always being there,

    Spider Man

  19. #19
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    Yes, Packster, Yahoo's bot is Slurp and the new MSN bot is MSNBot - these plus Googlebot and a few others will still be able to spider your site.

    You name the text file robots.txt and upload it to public_html or equivalent directory. Good spiders will request this before they request any of your pages. This is part of the robots.txt protocol.

    There's a whole bunch of stuff in Mike's robots.txt that you might not need to exclude (ie: a whole list of folders at the start that you might not have). Also I wouldn't exclude Looksmart and I rather like the WayBackMachine, so I'd let ia_archiver in as well, but it's up to you.

  20. #20
    ABW Ambassador Packy's Avatar
    Join Date
    January 18th, 2005
    Location
    Syracuse
    Posts
    4,205
    Marky,

    Thanks for the feedback. I did exclude Looksmart when I added the file to a site earlier. I saw the 2 ia_archiver and wasn't sure whether that was the WayBackMachine or not. I also like the WayBackMachine and as a matter of fact it saved my as when I lost a bunch of sites when TH crashed way back. Thanks for the advise
    The Answer to the New York Tax Law - Repeal, REPEAL, REPEAL -

    Camping Gear and Equipment

  21. #21
    Full Member
    Join Date
    January 18th, 2005
    Posts
    235
    Hey all,

    I need some help. First thank you for this post, I am new to AM and did not know there were bad bots. I have created a robots.tx file ( copied from this post)thanks again. I have uploaded it to my servers.

    My question is my most visit page is http://xxxxxxx/_vti_bin/_vti_aut/author.exe

    which I thought would be blocked from the bots.

    My third most is http://xxxxxxx/lines.swf

    which I also thought would be blocked.

    Did I do something incorrectly in setting up my robots file and more importantly is this having an affect on my being listed into these SE.

    Thank you for all you do.

    Spider Man

  22. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Looking for a Good Robots.txt file
    By Doug247 in forum Midnight Cafe'
    Replies: 8
    Last Post: July 24th, 2007, 06:18 PM
  2. editing Robots.txt file
    By KODea in forum Search Engine Optimization
    Replies: 5
    Last Post: August 28th, 2006, 03:02 PM
  3. robots.txt file
    By UncleScooter in forum Midnight Cafe'
    Replies: 2
    Last Post: October 15th, 2004, 06:39 PM
  4. robots.txt file question
    By Doc Sawyer in forum Search Engine Optimization
    Replies: 4
    Last Post: December 7th, 2002, 07:10 PM
  5. robots.txt file????? HELP
    By john dundas in forum Search Engine Optimization
    Replies: 4
    Last Post: August 18th, 2002, 05:36 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •