Results 1 to 21 of 21
  1. #1
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    Ban Bad Bots And Countries
    I have started banning bad bots, and countries, from my sites usings htaccess rewrite.

    It works real good. It automatically returns a 403 to them and bans them.

    I thought maybe some of you'all would like the information so you can use it on your sites.

    Code:
    RewriteEngine On
    RewriteCond %{HTTP_REFERER} \.ru [NC,OR]
    RewriteCond %{HTTP_REFERER} \.kz [NC,OR]
    RewriteCond %{HTTP_REFERER} \.in [NC,OR]
    RewriteCond %{HTTP_REFERER} \.lv [NC,OR]
    RewriteCond %{HTTP_REFERER} \.ua [NC,OR]
    RewriteCond %{HTTP_REFERER} \.cn [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Baiduspider* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Ezooms* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*AhrefsBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*proximic* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Twiceler* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Java* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*spbot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*libwww-perl* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*DotBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Sogou-Test-Spider* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ia_archiver* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*agbot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*GeoHasher* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*TurnitinBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*JikeSpider* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*voilabot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Sosospider* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Wayback* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*80legs* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*coccoc* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*YodaoBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Exabot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Nutch* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*DigExt* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*mgmt.mic* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*SeznamBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*discoverybot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*MJ12bot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*SearchmetricsBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*SEOstats* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*GrapeshotCrawler* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*YandexBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*meanpathbot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*YYSpider* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Yeti* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*MyNutchTest* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*CareerBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Wotbox* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*A6-Indexer* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*sogou* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*seoresearch* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*accelobot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Alcohol\ Search* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebMoney\ Advisor* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*news\ bot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*evuln.com* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*r-e-f-e-r-e-r.com* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*aboutthedomain* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Zeus* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*larbin* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BlackWidow* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Custo* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*DISCo* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Download\ Demon* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*eCatch* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EirGrabber* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EmailSiphon* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EmailWolf* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Express\ WebPictures* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ExtractorPro* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EyeNetIE* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*FlashGet* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*GetRight* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Go-Ahead-Got-It* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*GrabNet* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Grafula* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*HMView* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Indy\ Library* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Image\ Stripper* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Image\ Sucker* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*InterGET* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Internet\ Ninja* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*JetCar* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*JOC\ Web\ Spider* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*libghttp* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*LeechFTP* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Mass\ Downloader* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*MIDown\ tool* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Missigua* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Mister\ PiX* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Navroad* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NearSite* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NetAnts* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NetSpider* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Net\ Vampire* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NetZIP* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*PageGrabber* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Papa\ Foto* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*pavuk* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*RealDownload* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ReGet* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*SiteSnagger* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*SmartDownload* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*SuperBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*SuperHTTP* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Surfbot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*tAkeOut* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Teleport\ Pro* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*VoidEYE* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Web\ Image\ Collector* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Web\ Sucker* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebAuto* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebCopier* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebFetch* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebGo\ IS* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebLeacher* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebReaper* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebSauger* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Website\ eXtractor* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Website\ Quester* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebStripper* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebWhacker* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WebZIP* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WWWOFFLE* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Alexibot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Anonymouse.org* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*asterias* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BackDoorBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BackWeb* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BatchFTP* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Bigfoot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Black.Hole* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BlowFish* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BotALot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Buddy* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BuiltBotTough* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Bullseye* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*BunnySlippers* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Cegbfeieh* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*CheeseBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*CherryPicker* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Collector* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Copier* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*CopyRightCheck* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*cosmos* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Crescent* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*DIIbot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*DittoSpyder* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Download\ Demon* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Download\ Devil* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Download\ Wonder* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*dragonfly* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Drip* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*eCatch* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EasyDL* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ebingbong* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EmailCollector* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*EroCrawler* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*FileHound* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Harvest* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*IlseBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*InfoNaviRobot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*InfoTekies* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Iria* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Jakarta* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Jyxobot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Kenjin.Spider* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Keyword.Density* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NAMEPROTECT* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*NimbleCrawler* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*OutfoxBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Xenu* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WISENutbot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Zyborg* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*tab.search.daum.net* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Web.Image.Collector* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*WWW-Collector-E* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*sitecheck.internetseer.com* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*YisouSpider* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*linkdexbot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*Add\ Catalog* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ChinaClaw* [NC]
    RewriteRule .* - [F]
    Do not put an [OR] at the end of the list - or a [NC,OR] - or you will ban yourself from your site - and everybody else to.

    As new sites and countries show up just add them to the list.

    Example: RewriteCond %{HTTP_USER_AGENT} ^.*WhateverBotYouWantToBan* [NC,OR]

    Put it before the:

    RewriteCond %{HTTP_USER_AGENT} ^.*ChinaClaw* [NC]
    RewriteRule .* - [F]

    To Ban Country:

    RewriteCond %{HTTP_REFERER} \.ru [NC,OR]

    This bans Russia. Just make a new line with other countries you want to ban.

    Put it before the:

    RewriteCond %{HTTP_USER_AGENT} ^.*ChinaClaw* [NC]
    RewriteRule .* - [F]

    also.

    It a bot name has a space in it you have to use a \ before each space.

    Example:

    RewriteCond %{HTTP_USER_AGENT} ^.*Web\ Image\ Collector* [NC,OR]

    Have fun kicking them out of your sites. It works every time.
    Last edited by BurgerBoy; July 28th, 2013 at 11:04 AM. Reason: Added New Bot

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW


  2. #2
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    Three new bots to ban since I made the above post:

    RewriteCond %{HTTP_USER_AGENT} ^.*CrystalSemanticsBot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*ContextAd\ Bot* [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*grapeFX* [NC,OR]

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW

  3. #3
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    Another one to add:

    RewriteCond %{HTTP_USER_AGENT} ^.*SISTRIX Crawler* [NC,OR]

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW

  4. #4
    Beachy Bill's Avatar
    Join Date
    November 20th, 2005
    Posts
    8,266
    Thank you for proving this info as a copy-n-paste resource.
    Bill / Marketing Blog @ 12PM - Current project: Resurrecting my "baby" at South Baltimore..
    Cute Personal Checks and Business Checks
    If you are too busy to laugh you are too busy.

  5. Thanks From:

  6. #5
    Member Prosperent's Avatar
    Join Date
    November 29th, 2009
    Location
    CO, USA
    Posts
    820
    This is a tough one to tackle. We use cloudflare which has a firewall that allows you to block ip's, countries, even ranges. You can also setup different base security so you can filter quite a bit right from the start. We find that we have to add a few dozen ip's a week, and the majority can't be blocked by useragent alone.

  7. #6
    Member Steve's Avatar
    Join Date
    February 8th, 2005
    Location
    USA
    Posts
    12
    Does it actually work? Banning a country?

  8. #7
    Member Prosperent's Avatar
    Join Date
    November 29th, 2009
    Location
    CO, USA
    Posts
    820
    It does with cloudflare from my experience. I prefer blocking thing upstream of my servers to save on the processing of each request that comes in. Saves us quite a bit based on our traffic volume, but probably not such a big deal for most sites.

  9. #8
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Quote Originally Posted by Steve View Post
    Does it actually work? Banning a country?
    Not with Burgerboy's code above.

    Code:
    RewriteCond %{HTTP_REFERER} \.ru [NC,OR]
    RewriteCond %{HTTP_REFERER} \.kz [NC,OR]
    RewriteCond %{HTTP_REFERER} \.in [NC,OR]
    RewriteCond %{HTTP_REFERER} \.lv [NC,OR]
    RewriteCond %{HTTP_REFERER} \.ua [NC,OR]
    RewriteCond %{HTTP_REFERER} \.cn [NC,OR]
    That will only block traffic coming from another website that has that domain extension.

    On a dedicated server (maybe VPS) you can use a firewall to block individual IPs and IP Ranges (CIDRs). But blocking say all of China or Russia will cause serious problems with most server's firewalls - trust me, have tried.

    So the solution we came up with is to use MaxMind's geoIP database and only ALLOW specific countries. We have the db on the server which allows us to only need a few lines of code in an .htaccess file.

    Code:
    GeoIPEnable On
    SetEnvIf GEOIP_COUNTRY_CODE US AllowCountry
    SetEnvIf GEOIP_COUNTRY_CODE CA AllowCountry
    
    Deny from all
    Allow from env=AllowCountry
    This is probably possible on shared hosting - however, you would probably need to have the db in EACH domain you have on the hosting account and you would have to note the location of that db in .htaccess. Have never tried. But you can search online for MaxMind geoIP and should be able to find it.

    Now, we still block trouble IPs in the firewall. Firewall will block all access to the server while the geoIP will only block/allow traffic to the website. Combination of both works best. IE: Bad script from Timbuktu scanning for open ports on the server is best handled by the firewall, but general traffic originating from Timbuktu IPs will get a 403 forbidden page.

    cPanel hosting has an IP deny manager if you don't have access to the server's firewall - it will put blocked IPs in your .htaccess for you...
    Last edited by Convergence; July 29th, 2013 at 03:39 PM.
    Salty kisses, Sandy toes, and a Pirate's heart...

  10. #9
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Salty kisses, Sandy toes, and a Pirate's heart...

  11. Thanks From:

  12. #10
    Member Prosperent's Avatar
    Join Date
    November 29th, 2009
    Location
    CO, USA
    Posts
    820
    We used to do that, but running through our stats, about 18% of our sales take place from ip's outside of the united states even when we only offered a US api. So, be careful blocking countries, you will definitely end up blocking sales as well.

    Top countries for U.S products for us are as follows:

    Canada
    Australia
    United Kingdom
    Singapore
    Japan
    Korea
    Hong Kong
    New Zeland
    Mexico
    Switzerland
    France
    Taiwan
    Russia
    Saudi Arabia
    Germany

    (and about 120 others)

  13. Thanks From:

  14. #11
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Quote Originally Posted by Prosperent View Post
    So, be careful blocking countries, you will definitely end up blocking sales as well.
    Yes, but compared to the bad traffic, scrapers, support ticket spammers, etc - wasn't worth it. The time it takes to clean up the damage didn't justify the amount of sales.

    However, when we launch our flagship in the UK, we will have to open up our .com to UK traffic as well. Hoping the sales will offset the damage that will be done by the bad guys...
    Salty kisses, Sandy toes, and a Pirate's heart...

  15. Thanks From:

  16. #12
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Top it off that some brands, such as Nike, do not allow merchants to ship out of the country. Not to mention some merchants don't pay commissions on exported orders.

    Until we can get export restrictions and commission exceptions from every merchant, EASILY - then blocked they will be...
    Salty kisses, Sandy toes, and a Pirate's heart...

  17. #13
    Member Prosperent's Avatar
    Join Date
    November 29th, 2009
    Location
    CO, USA
    Posts
    820
    I know what you mean, but 18% is a lot of income to lose. We ended up using cloudflare for this very reason. Their low security setting ends up blocking about 90% of the problem traffic. The rest is pretty easy to manually block via regular traffic audits.

  18. #14
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Hi all, I heard about this post and after taking a look I had to come in to add a few points. The code at the top is possibly better than nothing, but there are a few issues with it.

    First is that many bots in that list have not been seen for many years. A shotgun approach adds a lot of extra work for zero benefit at a cost of slowing down your server. One suggestion is to learn to use regular expressions before pasting code into your htaccess file.

    The ^ character means 'the start of a string', the '.' means anything (virtually ANYthing except a line break) and '*' means 'one or more of whatever came before it'. So translating a line for example:
    ^.*Baiduspider* [NC,OR]
    is telling the server to look for a string of characters that has any characters AND repeats whatever characters it finds before Baiduspider and no characters come after the last r in spider. You can see that it is not likely to work very well because it depends on your server seaching every request for some unlikely user agent strings.
    Fortunately, the correct code is even easier to use. If each line is changed to this format :
    Baiduspider [NC,OR]
    then the server can look for exactly that string and to make it more efficient, you can add more than one User_agent per line with a group ( ...) and some pipes '|' that mean 'or'. This is an example (incomplete - make your own) that I use on most sites:
    Code:
    RewriteCond %{HTTP_USER_AGENT} (Access|Ahrefs|appid|Blog) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (Capture|Client|Copy|crawler|curl) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (Data|devSoft|Domain|download) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (Engine|Exabot|fetch|filter|flip|genieo) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (Jakarta|Java|Library|link|libww) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (MJ12bot|Netcraft|news|nutch) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (Pages|Proxy|Publish|Python|scout) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (scraper|SiteExplorer|snippets|spider) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} (urllib|Wget|Win32|WinHttp|Wotbox) [NC]
    RewriteRule .* - [F]
    John posted a most valuable Spider Trap here a few years ago at http://forum.abestweb.com/showthread.php?p=902504 - I don't know if the link still works, but it is all in one place exactly how to automatically trap scrapers and automatically block them from your site. And it works on every site I've ever set up whether WP or html.

    The best practice to eliminate unwanted traffic without accidentally blocking traffic comes from a combination of codes and regularly examining your raw access logs to see exactly who's doing what on your site. Get friendly with whois so you can learn how to block entire ranges of bad players with a tiny line added to your htaccess.

    I highly discourage people to copy and paste things into their htaccess without a very good understanding of regex and exactly telling the server what you want it to do IN THE PROPER ORDER. Sorry for the caps, but that is FAR more important than adding the code. Servers are not all set up the same way and you may need to make changes if you move your domain - what works on Host-X might break Host-Y. Unless you have server access to set up the HTTPD, you are making assumptions that need to be tested before relying on them.

    For my own sites I maintain a database so I can look up suspicious activity quickly to determine whether to block a single IP or entire CIDR ranges.

    This is by no means a complete how-to, but I hope it helps some start looking in the right places for answers about what we can do to try to protect our content from some of the malicious traffic that comes through.


  19. #15
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Nice to see you, 2Busy - thanks for posting...
    Salty kisses, Sandy toes, and a Pirate's heart...

  20. #16
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,744
    Hi 2busy - So nice to see you posting again and sharing your knowledge.

  21. #17
    ABW Ambassador
    Join Date
    January 4th, 2006
    Location
    USA
    Posts
    2,477
    I'm so glad to see you back and post, 2busy. How have you been?

    I was going to start a thread "where is 2 busy" the other day. Maybe you heard me...

  22. #18
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Hi all, good to see you all again too! I did not mean to take this thread off track, I hope that everyone who is concerned about scrapers will learn to control their traffic. It feels good to see a thousand pageloads a day but if your access logs show that 300 of those were in 10 minutes to a User_agent named Site-Explorer it is something you want to be able to prevent.

    I also wanted to thank BurgerBoy for trying to help. I used a similar kind of copy & paste code I had found somewhere and felt like I had locked my content down a few years back before I started reading up on it all. There are a lot of sites out there that share outdated information, and Apache has had several major updates since the code shown here may have worked. That is why it is good to visit the documentation for your server. These codes and htaccess files only work on apache/linux/unix servers, if your sites are hosted on Windows servers you should not bother to read about this, it won't help and I know absolutely nothing about what to do for Windows servers. In all cases your host should be able to help you with correctly configuring your settings files to work with their server settings.

    If you are using mod_access directives (like those you set up with John's Spider Trap) to deny access to IP's, one thing I learned is that you can block entire A name ranges by using CIDRs. If you are using John's Spider Trap it will automatically block by IP - again it only works on non-Windows servers - but in the case of a server dedicated to mischief that only slows them down. Viewing your raw access logs, you can see a robot change IP addresses during a session. So if you have blocked IP 123.45.678.910 you haven't stopped access and they may be right back on 123.45.678.911 so to block them all you need to close the range assigned to that server. Whois will give you that information. It may look like 123.45.0.0/12 and that will block all IPs from 123.45.0.0 to 123.45.255.255 (for example) using just one line in your htaccess. Unless you are a binary numbers genius you will need to look up each one because there is no way to know what range is assigned to what server. You don't want to block by guessing, you might accidentally include Verizon in the range for some scraper.

    As for blocking entire countries, that is extremely difficult because IPs are not assigned by country. If you visit internic, you can get a list of what entity contols each A name range but RIPE covers countries you may want to block and countries you want to have visiting. Everything from France and Great Britain to Russia and Latvia is controlled by RIPE. APNIC covers China - and Australia. So knowing that every IP that starts with 211. is covered by APNIC does not give you enough information to know whether to block all 211. IPs. That is why it is a good idea to start building yourself a database as you look up each IP you want to block. You will see a pattern emerge and learn to block all servers operated by OVH for example. Also important is to keep track of the dates. IPs that block robots today may block customers tomorrow if you don't know their expiration.

    If you use a MAC there is a cool "Network Utilities" tool that is included with a program that called istat menus that is about $30 at the App Store so you have an easy way to run whois lookups from your desktop.

    I gotta run, but I'll be happy to help with any questions you might have about this or the Spider Trap setup.
    Cheers!

  23. Thanks From:

  24. #19
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    Hi 2busy. Glad to hear from you again. Thanks for your help.

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW

  25. Thanks From:

  26. #20
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Great tool for looking up IPs, getting their CIDR, so you can block countries more effectively by IP.

    https://www.countryipblocks.net/
    Salty kisses, Sandy toes, and a Pirate's heart...

  27. Thanks From:

  28. #21
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    A good place to get current lists is Okean - The Goods and there are several places to buy current lists. I prefer to do it myself because I have the desktop App and it lets me verify a range rather than just getting a list without knowing for sure that the range covers the IP in particular that I'm blocking. Sometimes CIDRs can appear to be "too far away" to be the right one. One hint: the number after the / at the end indicates the size of the range. The lower that number is, the larger the range is.

    The whois for RIPE always gives you the CIDR, but ARIN (USA-CA) seldom does. APNIC never does. When they give you a range there are places online that will convert the range to the CIDR. Some surprising info turns up sometimes. just caught this 2 days ago: 72.14.192.0 - 72.14.255.255 which belongs to google. I would have never guessed. It was caught grabbing images with User_agent: GoogleProducer; (+http://goo.gl/7y4SX). It is blocked now with "deny from 72.14.192.0/18" until I know what the heck it is and why it is taking images.

    Another handy place to visit is at http://www.botopedia.org/ where you can learn more about bots, blocking and User_agent strings.

  29. Thanks From:

  30. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. BAD BOTS (not sure if this is the right forum)
    By Gordon in forum Suspicious Activity!
    Replies: 16
    Last Post: January 28th, 2007, 11:29 PM
  2. Bad Bots Help
    By reaper in forum Spam
    Replies: 9
    Last Post: February 6th, 2006, 08:59 PM
  3. Current bad bots htaccess list?
    By Andy in forum Spam
    Replies: 6
    Last Post: August 17th, 2004, 08:31 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •