Results 1 to 7 of 7
  1. #1
    ABW Ambassador Andy's Avatar
    Join Date
    January 18th, 2005
    Posts
    4,178
    I'm trying to find an up to date list of bad bots to ban through htaccess. I want to disallow access to harvesters, copiers, etc. Anyone know of one?

    A search here and on Google didn't turn up anything.

    Thanks,
    Andy

  2. #2
    Full Member
    Join Date
    January 18th, 2005
    Posts
    235
    Andy,

    Go and search in ABW under the hosting section I think. For robots.txt or allowing robots. There was a really good thread a while ago that I used to create a list. I think it was leader or hyder who actually posted a list on the board. If you can't find let me know and I will look for it. I can't right now getting ready to leave for the evening.

    Later,

    Spider Man

  3. #3
    Affiliate Miester my2cents's Avatar
    Join Date
    January 18th, 2005
    Location
    far far away....
    Posts
    2,161
    Andy, I sent you a PM
    ++++++++++++++++++++++++++++++++++++++++++
    that's my2cents, 'cuz I'm a legend in my own mind....

  4. #4
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    this is a list that Ecom Mike posted a while back

    User-agent: 216.34.209.23
    Disallow: /


    User-agent: TurnitinBot
    Disallow: /


    User-agent: grub-client
    Disallow: /

    User-agent: grub
    Disallow: /

    User-agent: looksmart
    Disallow: /

    User-agent: WebZip
    Disallow: /

    User-agent: larbin
    Disallow: /

    User-agent: b2w/0.1
    Disallow: /

    User-agent: psbot
    Disallow: /

    User-agent: Python-urllib
    Disallow: /


    User-agent: URL_Spider_Pro
    Disallow: /

    User-agent: CherryPicker
    Disallow: /

    User-agent: EmailCollector
    Disallow: /

    User-agent: EmailSiphon
    Disallow: /

    User-agent: WebBandit
    Disallow: /

    User-agent: EmailWolf
    Disallow: /

    User-agent: ExtractorPro
    Disallow: /

    User-agent: CopyRightCheck
    Disallow: /

    User-agent: Crescent
    Disallow: /

    User-agent: SiteSnagger
    Disallow: /

    User-agent: ProWebWalker
    Disallow: /

    User-agent: CheeseBot
    Disallow: /

    User-agent: LNSpiderguy
    Disallow: /

    User-agent: ia_archiver
    Disallow: /

    User-agent: ia_archiver/1.6
    Disallow: /

    User-agent: Alexibot
    Disallow: /

    User-agent: Teleport
    Disallow: /

    User-agent: TeleportPro
    Disallow: /

    User-agent: MIIxpc
    Disallow: /

    User-agent: Telesoft
    Disallow: /

    User-agent: Website Quester
    Disallow: /

    User-agent: moget/2.1
    Disallow: /

    User-agent: WebZip/4.0
    Disallow: /

    User-agent: WebStripper
    Disallow: /

    User-agent: WebSauger
    Disallow: /

    User-agent: WebCopier
    Disallow: /

    User-agent: NetAnts
    Disallow: /

    User-agent: Mister PiX
    Disallow: /

    User-agent: WebAuto
    Disallow: /

    User-agent: TheNomad
    Disallow: /

    User-agent: WWW-Collector-E
    Disallow: /

    User-agent: RMA
    Disallow: /

    User-agent: libWeb/clsHTTP
    Disallow: /

    User-agent: asterias
    Disallow: /

    User-agent: httplib
    Disallow: /

    User-agent: turingos
    Disallow: /

    User-agent: spanner
    Disallow: /

    User-agent: InfoNaviRobot
    Disallow: /

    User-agent: Harvest/1.5
    Disallow: /

    User-agent: Bullseye/1.0
    Disallow: /

    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow: /

    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /

    User-agent: CherryPickerSE/1.0
    Disallow: /

    User-agent: CherryPickerElite/1.0
    Disallow: /

    User-agent: WebBandit/3.50
    Disallow: /

    User-agent: NICErsPRO
    Disallow: /

    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /

    User-agent: DittoSpyder
    Disallow: /

    User-agent: Foobot
    Disallow: /

    User-agent: SpankBot
    Disallow: /

    User-agent: BotALot
    Disallow: /

    User-agent: lwp-trivial/1.34
    Disallow: /

    User-agent: lwp-trivial
    Disallow: /

    User-agent: BunnySlippers
    Disallow: /

    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /

    User-agent: URLy Warning
    Disallow: /

    User-agent: Wget/1.6
    Disallow: /

    User-agent: Wget/1.5.3
    Disallow: /

    User-agent: Wget
    Disallow: /

    User-agent: LinkWalker
    Disallow: /

    User-agent: cosmos
    Disallow: /

    User-agent: moget
    Disallow: /

    User-agent: hloader
    Disallow: /

    User-agent: humanlinks
    Disallow: /

    User-agent: LinkextractorPro
    Disallow: /

    User-agent: Offline Explorer
    Disallow: /

    User-agent: Mata Hari
    Disallow: /

    User-agent: LexiBot
    Disallow: /

    User-agent: Web Image Collector
    Disallow: /

    User-agent: The Intraformant
    Disallow: /

    User-agent: True_Robot/1.0
    Disallow: /

    User-agent: True_Robot
    Disallow: /

    User-agent: BlowFish/1.0
    Disallow: /

    User-agent: JennyBot
    Disallow: /

    User-agent: MIIxpc/4.2
    Disallow: /

    User-agent: BuiltBotTough
    Disallow: /

    User-agent: ProPowerBot/2.14
    Disallow: /

    User-agent: BackDoorBot/1.0
    Disallow: /

    User-agent: toCrawl/UrlDispatcher
    Disallow: /

    User-agent: WebEnhancer
    Disallow: /

    User-agent: suzuran
    Disallow: /

    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /

    User-agent: VCI
    Disallow: /

    User-agent: Szukacz/1.4
    Disallow: /

    User-agent: QueryN Metasearch
    Disallow: /

    User-agent: Openfind data gathere
    Disallow: /

    User-agent: Openfind
    Disallow: /

    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /

    User-agent: Xenu's
    Disallow: /

    User-agent: Zeus
    Disallow: /

    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /

    User-agent: RepoMonkey
    Disallow: /

    User-agent: Microsoft URL Control
    Disallow: /

    User-agent: Openbot
    Disallow: /

    User-agent: URL Control
    Disallow: /

    User-agent: Zeus Link Scout
    Disallow: /

    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /

    User-agent: Webster Pro
    Disallow: /

    User-agent: EroCrawler
    Disallow: /

    User-agent: LinkScan/8.1a Unix
    Disallow: /

    User-agent: Keyword Density/0.9
    Disallow: /

    User-agent: Kenjin Spider
    Disallow: /

    User-agent: Iron33/1.0.2
    Disallow: /

    User-agent: Bookmark search tool
    Disallow: /

    User-agent: GetRight/4.2
    Disallow: /

    User-agent: FairAd Client
    Disallow: /

    User-agent: Gaisbot
    Disallow: /

    User-agent: Aqua_Products
    Disallow: /

    User-agent: Radiation Retriever 1.1
    Disallow: /

    User-agent: WebmasterWorld Extractor
    Disallow: /

    User-agent: Flaming AttackBot
    Disallow: /

    User-agent: Oracle Ultra Search
    Disallow: /

    User-agent: PerMan
    Disallow: /

    User-agent: searchpreview
    Disallow: /
    One day parasites and their ilk will be made illegal, I bet a few Lawyers will be pissed off when the day comes.
    Mr. Spitzer is fetching it nearer

    YouTrek

  5. #5
    ABW Ambassador Andy's Avatar
    Join Date
    January 18th, 2005
    Posts
    4,178
    @ my2cents: Got it! Thanks!

    I know somewhere there's a list of the bad bots, including their IP numbers, that goes in htaccess and denies them access to your site.

    A bad bot likely won't pay any attention at all to the robots.txt, and will sometimes use it to go to parts of your site you've asked them not to. A friend has a script set up with a bad bot trap that automatically bans them if they go to a file that's disallowed. Pretty cool.

    I've tried the robots.txt, and I'm still getting visits from disallowed bots, so I'm going to deny them at server level, and send them off to disney.com, some other place.

    I know there's a list out there someplace, I'll post it if I find it.

    Thanks everyone!

    Andy

  6. #6
    ABW Ambassador DesignerWiz's Avatar
    Join Date
    January 18th, 2005
    Location
    U.S.A
    Posts
    2,777
    I just took a look at our robots.txt list and I seen a few on ours that were not included in the list above ... maybe this helps you a little.

    User-agent: 216.34.209.23
    Disallow: /
    User-agent: Alexibot
    Disallow: /
    User-agent: Aqua_Products
    Disallow: /
    User-agent: asterias
    Disallow: /
    # B
    User-agent: b2w/0.1
    Disallow: /
    User-agent: BackDoorBot/1.0
    Disallow: /
    User-agent: Bookmark search tool
    Disallow: /
    User-agent: BotALot
    Disallow: /
    User-agent: BuiltBotTough
    Disallow: /
    User-agent: Bullseye/1.0
    Disallow: /
    User-agent: BunnySlippers
    Disallow: /
    # C
    User-agent: Cegbfeieh
    Disallow: /
    User-agent: CheeseBot
    Disallow: /
    User-agent: CherryPicker
    Disallow: /
    User-agent: CherryPickerSE/1.0
    Disallow: /
    User-agent: CherryPickerElite/1.0
    Disallow: /
    User-agent: Copernic
    Disallow: /
    User-agent: CopyRightCheck
    Disallow: /
    User-agent: cosmos
    Disallow: /
    User-agent: Crescent
    Disallow: /
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /
    # D
    User-agent: DittoSpyder
    Disallow: /
    User-agent: dumbot
    Disallow: /
    # E
    User-agent: EmailCollector
    Disallow: /
    User-agent: EmailSiphon
    Disallow: /
    User-agent: EmailWolf
    Disallow: /
    User-agent: Enterprise_Search
    Disallow: /
    User-agent: Enterprise_Search/1.0
    Disallow: /
    User-agent: EroCrawler
    Disallow: /
    User-agent: es
    Disallow: /
    User-agent: ExtractorPro
    Disallow: /
    # F
    User-agent: FairAd Client
    Disallow: /
    User-agent: Flaming AttackBot
    Disallow: /
    User-agent: Foobot
    Disallow: /
    # G
    User-agent: Gaisbot
    Disallow: /
    User-agent: GetRight/4.2
    Disallow: /
    User-agent: grub
    Disallow: /
    User-agent: grub-client
    Disallow: /
    # H
    User-agent: Harvest/1.5
    Disallow: /
    User-agent: Hatena Antenna
    Disallow: /
    User-agent: hloader
    Disallow: /
    User-agent: httplib
    Disallow: /
    User-agent: humanlinks
    Disallow: /
    # I
    User-agent: ia_archiver
    Disallow: /
    User-agent: ia_archiver/1.6
    Disallow: /
    User-agent: InfoNaviRobot
    Disallow: /
    User-agent: Iron33/1.0.2
    Disallow: /
    # J
    User-agent: JennyBot
    Disallow: /
    # K
    User-agent: Kenjin Spider
    Disallow: /
    User-agent: Keyword Density/0.9
    Disallow: /
    # L
    User-agent: larbin
    Disallow: /
    User-agent: LexiBot
    Disallow: /
    User-agent: libWeb/clsHTTP
    Disallow: /
    User-agent: LinkextractorPro
    Disallow: /
    User-agent: LinkScan/8.1a Unix
    Disallow: /
    User-agent: LinkWalker
    Disallow: /
    User-agent: LNSpiderguy
    Disallow: /
    User-agent: looksmart
    Disallow: /
    User-agent: lwp-trivial
    Disallow: /
    User-agent: lwp-trivial/1.34
    Disallow: /
    # M
    User-agent: Mata Hari
    Disallow: /
    User-agent: Microsoft URL Control
    Disallow: /
    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /
    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /
    User-agent: MIIxpc
    Disallow: /
    User-agent: MIIxpc/4.2
    Disallow: /
    User-agent: Mister PiX
    Disallow: /
    User-agent: moget
    Disallow: /
    User-agent: moget/2.1
    Disallow: /
    User-agent: mozilla/4
    Disallow: /
    User-agent: mozilla/5
    Disallow: /
    User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 9)
    Disallow: /
    User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95
    Disallow: /
    User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
    Disallow: /
    # N
    User-agent: naver
    Disallow: /
    User-agent: NetAnts
    Disallow: /
    User-agent: NICErsPRO
    Disallow: /
    # O
    User-agent: Offline Explorer
    Disallow: /
    User-agent: Openbot
    Disallow: /
    User-agent: Openfind
    Disallow: /
    User-agent: Openfind data gathere
    Disallow: /
    User-agent: Oracle Ultra Search
    Disallow: /
    # P
    User-agent: PerMan
    Disallow: /
    User-agent: ProPowerBot/2.14
    Disallow: /
    User-agent: ProWebWalker
    Disallow: /
    User-agent: psbot
    Disallow: /
    # Q
    User-agent: QueryN Metasearch
    Disallow: /
    # R
    User-agent: Radiation Retriever 1.1
    Disallow: /
    User-agent: RepoMonkey
    Disallow: /
    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /
    User-agent: RMA
    Disallow: /
    # S
    User-agent: searchpreview
    Disallow: /
    User-agent: SiteSnagger
    Disallow: /
    User-agent: sootle
    Disallow: /
    User-agent: SpankBot
    Disallow: /
    User-agent: spanner
    Disallow: /
    User-agent: suzuran
    Disallow: /
    User-agent: Szukacz/1.4
    Disallow: /
    # T
    User-agent: Teleport
    Disallow: /
    User-agent: TeleportPro
    Disallow: /
    User-agent: Telesoft
    Disallow: /
    User-agent: The Intraformant
    Disallow: /
    User-agent: TheNomad
    Disallow: /
    User-agent: TightTwatBot
    Disallow: /
    User-agent: toCrawl/UrlDispatcher
    Disallow: /
    User-agent: True_Robot
    Disallow: /
    User-agent: True_Robot/1.0
    Disallow: /
    User-agent: turingos
    Disallow: /
    User-agent: TurnitinBot
    Disallow: /
    # U
    User-agent: URL Control
    Disallow: /
    User-agent: URL_Spider_Pro
    Disallow: /
    User-agent: URLy Warning
    Disallow: /
    # V
    User-agent: VCI
    Disallow: /
    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /
    # W
    User-agent: WebZip
    Disallow: /
    # W
    User-agent: WebAuto
    Disallow: /
    User-agent: WebBandit
    Disallow: /
    User-agent: WebBandit/3.50
    Disallow: /
    User-agent: WebCopier
    Disallow: /
    User-agent: WebEnhancer
    Disallow: /
    User-agent: Web Image Collector
    Disallow: /
    User-agent: WebmasterWorld Extractor
    Disallow: /
    User-agent: WebmasterWorldForumBot
    Disallow: /
    User-agent: WebSauger
    Disallow: /
    User-agent: Website Quester
    Disallow: /
    User-agent: Webster Pro
    Disallow: /
    User-agent: WebStripper
    Disallow: /
    User-agent: WebZip/4.0
    Disallow: /
    User-agent: Wget
    Disallow: /
    User-agent: Wget/1.5.3
    Disallow: /
    User-agent: Wget/1.6
    Disallow: /
    User-agent: WWW-Collector
    Disallow: /
    User-agent: WWW-Collector-E
    Disallow: /
    # X
    User-agent: Xenu's
    Disallow: /
    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /
    # Y
    # Z
    User-agent: Zeus
    Disallow: /
    User-agent: Zeus Link Scout
    Disallow: /
    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /
    Ray Thomas
    Webmaster Resources: http://DesignerWiz.com
    ABW Board Category: Programming / Coding
    http://forum.abestweb.com/forumdisplay.php?f=190

  7. #7
    Member
    Join Date
    January 18th, 2005
    Posts
    64
    This is my list, not very up to date though.

    RewriteBase /
    RewriteCond %{HTTP_USER_AGENT} ^amzn_assoc [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Webcopier [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Openbot/3.0 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
    RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebBandit [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^MicrosoftPrototypeCrawler [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetResearchServer/2.7 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
    RewriteCond %{HTTP_USER_AGENT} Zeus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} webinator [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Teleport [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NPBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InternetSeer [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EasyDL/3\.04 [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Nutch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
    RewriteCond %{HTTP_USER_AGENT} almaden [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Link [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^LinkWalker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebGather [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ProductionBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
    RewriteCond %{HTTP_USER_AGENT} ^LiteBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^psbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zao [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^libwww-perl [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^trademarktracker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} HARVEST_VERSION [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EducateSearch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebFilter [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebZip [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} DTS\ Agent [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} email [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mac\ Finder [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} Java [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^OfflineExplorer [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^lachesis [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NaverRobot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Crawl_Application [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mozilla*4.7
    RewriteRule ^.* - [F]

  8. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Featured: Ban Bad Bots And Countries
    By BurgerBoy in forum Programming / Datafeeds / Tools
    Replies: 20
    Last Post: August 8th, 2013, 04:36 PM
  2. Blocking Bots in .htaccess
    By Witzer in forum Programming / Datafeeds / Tools
    Replies: 11
    Last Post: January 28th, 2011, 03:58 PM
  3. BAD BOTS (not sure if this is the right forum)
    By Gordon in forum Suspicious Activity!
    Replies: 16
    Last Post: January 28th, 2007, 11:29 PM
  4. Bad Bots Help
    By reaper in forum Spam
    Replies: 9
    Last Post: February 6th, 2006, 08:59 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •