Results 1 to 9 of 9
  1. #1
    ABW Ambassador Doug247's Avatar
    Join Date
    January 18th, 2005
    Location
    DE USA
    Posts
    931
    Looking for a Good Robots.txt file
    Is the robots text file out there that is configured to stop all the nasty bots from crawling a site. If not is there anyone who would be willing to contribute to the creation of a file. We could use it here at ABW to try to cut down on leaking bandwidth and help with making logs and reports more accurate.


    Thanks,
    Doug

  2. #2
    Lite On The Do, Heavy On The Nuts Donuts's Avatar
    Join Date
    January 18th, 2005
    Location
    Winter Park, FL
    Posts
    6,930
    A robots.txt file contains instructions for crawlers that are interested in listening to what you tell them to do, and not do, on your site - it doesn't block anybody who's intentionally nasty.

  3. #3
    Affiliate Manager PianoWizard.com's Avatar
    Join Date
    January 22nd, 2007
    Posts
    31
    Quote Originally Posted by simplistechs
    Is the robots text file out there that is configured to stop all the nasty bots from crawling a site. If not is there anyone who would be willing to contribute to the creation of a file. We could use it here at ABW to try to cut down on leaking bandwidth and help with making logs and reports more accurate.

    Thanks,
    Doug
    Doug,

    Here's the one we use: http://www.pianowizard.com/robots.txt.

    I pieced it together from several sources, the primary being WebmasterWorld's version.

    It blocks all the nasty bots that are WILLING to be blocked.

    Enjoy!

  4. #4
    ABW Ambassador Doug247's Avatar
    Join Date
    January 18th, 2005
    Location
    DE USA
    Posts
    931
    Thanks that is what I was lookin for.

  5. #5
    Affiliate Manager PianoWizard.com's Avatar
    Join Date
    January 22nd, 2007
    Posts
    31
    Happy to help!

  6. #6
    SEO: A Specialty - Web Design: Slow or outsourced andbeyond's Avatar
    Join Date
    June 18th, 2006
    Location
    The Call is coming from Inside the House!
    Posts
    1,332
    Abestweb uses something similar to block bots:

    http://forum.abestweb.com/robots.txt

    But did you check your statistics to see what bot(s) it is? Sometimes Google and Yahoo (Slurp) just love somes sites and just cant stop reading it. And you dont want to exclude them I bet.

    If there are some folders that you dont want indexed like /search /images /cgi-sys or something else you can exclude them and it might calm some of the bots down a bit.

    Good Luck

  7. #7
    Affiliate Manager adambha's Avatar
    Join Date
    October 20th, 2006
    Posts
    301
    Quote Originally Posted by Donuts
    ...crawlers that are interested in listening to what you tell them to do...it doesn't block anybody who's intentionally nasty.
    Yeah, that's why it's better to straight out block them. Add the following to your .htaccess file:

    Code:
    SetEnvIfNoCase User-Agent ^BadRobotUserAgent1$ bad_bot
    SetEnvIfNoCase User-Agent ^BadRobotUserAgent2$ bad_bot
    SetEnvIfNoCase User-Agent ^BadRobotUserAgent3$ bad_bot
    
    order allow,deny
    deny from env=bad_bot
    allow from all
    Obviously, replace the BadRobotUserAgent with your own list of bad bots.

    This will literally block them (whether they like it or not) returning a 403 forbidden to their request.

  8. #8
    Roll Tide mobilebadboy's Avatar
    Join Date
    January 18th, 2005
    Location
    Mobile, Alabama
    Posts
    1,220
    Yeah, I prefer htaccess, only those (few) that pay attention to robots.txt will be blocked. If anyone wants my list (while it needs additions) here it is. I would suggest browsing it to make sure nothing's being blocked that you don't want blocked (which I only try to ban things I deem useless).

    Which I do need to update it, I've run across several new amateur bots which are now scattered through various htaccess files.

    Disclaimer: I don't claim this a perfect list, not do I claim it works perfect. I've not noticed any of these bots on my site(s) that I run this list on since I added them.

    RewriteEngine On

    RewriteCond %{HTTP_USER_AGENT} "192.comAgent" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*1Noonbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*4arcade.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Accoona.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*aipbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*AlkalineBOT.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Apexoo.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*appie.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*almaden.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Baidu.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*BecomeBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "blogsearchbot-martin-1" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "BlogStreetBot" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^bot/.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*BrowserEmulator.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "BruinBot" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*BusyBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Cazoodle.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*ccubee.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*cell-phone.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Crawllybot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "DA 7.0" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*dejavu.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*DiamondBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*dragonfly.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Dulance.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*EasyDL/.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*EbiNess.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*EmeraldShield.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*envolk.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Exabot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Factbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^FAST.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*findlinks.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Gaisbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*genieBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Gigabot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Gigablast.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Girafabot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*GOFORITBOT.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*heritrix.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^HTTP/1.0$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*i1searchbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "ia_archiver" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "iaea.org" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*ichiro.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*IlTrovatore.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ".*Indy Library.*" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*IP2MapBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*IRLbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Jakarta.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Java/(.+) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Jyxobot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*KBeeBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*KFSW-Bot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Krugle.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*lanshanbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*larbin.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*libcurl.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*libwww-perl.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*LocalcomBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*lucene.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*lworldmedia.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*lwp-trivial.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*MileNSbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Missigua.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*MJ12bot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Moreoverbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*MQbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "MSRBOT" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*MultiCrawler.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "MVAClient" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*MyFamilyBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*NaverBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Probe$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*NetResearchServer.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*nexen.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*NextGenSearchBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*NG-Search.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "nicebot" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*NimbleCrawler.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*noxtrumbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*NPBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Nutch.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*OmniExplorer.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*onCHECK.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*OutfoxBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*panscient.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Pingdom.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*PlantyNet.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*POE-Component-Client.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*psbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "psycheclone" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "PycURL" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*RAMPyBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*SBIder.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*ScSpider.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Scumbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Seznam.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Shim-Crawler.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*SCEJ\ PSP\ BROWSER.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Szukacz.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Snapbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Snappy.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Snoopy.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*SocietyRobot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*sogou.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*sproose.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*SurveyBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Susie.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*SynooBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Syntryx.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Szukacz.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*TerrawizBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*T-H-U-N-D-E-R-S-T-O-N-E.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*thumbshots.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*TurnitinBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Twiceler.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*updated.com.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Vespa.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Visbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Voila.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*voyager.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*W3Compositionbot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*WebaltBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} "WebSauger" [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Websquash.com.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*WebStripper.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*WebVulnCrawl.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*WinkBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WIRE.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*wwwster.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*YodaoBot.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*yoogli.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*zedzo.* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} .*Zeusbot.* [NC]
    RewriteRule ^.*$ - [F]

    Shawn Kerr (.com) | Disney World | SEC Football

  9. #9
    ABW Ambassador Doug247's Avatar
    Join Date
    January 18th, 2005
    Location
    DE USA
    Posts
    931
    Do I just copy and paste it into the my htaccess file?

    Thanks

  10. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. editing Robots.txt file
    By KODea in forum Search Engine Optimization
    Replies: 5
    Last Post: August 28th, 2006, 03:02 PM
  2. robots.txt file
    By UncleScooter in forum Midnight Cafe'
    Replies: 2
    Last Post: October 15th, 2004, 06:39 PM
  3. Do I need a "robots.txt" file?
    By unclejesse in forum Search Engine Optimization
    Replies: 20
    Last Post: June 12th, 2004, 11:02 AM
  4. robots.txt file question
    By Doc Sawyer in forum Search Engine Optimization
    Replies: 4
    Last Post: December 7th, 2002, 07:10 PM
  5. robots.txt file????? HELP
    By john dundas in forum Search Engine Optimization
    Replies: 4
    Last Post: August 18th, 2002, 05:36 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •