Results 1 to 11 of 11
  1. #1
    ABW Veteran Mr. Sal's Avatar
    Join Date
    January 18th, 2005
    Posts
    6,795
    Arrow TwengaBot Sucks...
    To me, the TwengaBot Sucks our bandwidth without any benefit to us, the affiliates...

    Note: !'m posting this on the ("Midnight Cafe')" forum instead of the (Search Engine Insight) forum because I think that... Well, (NM what I think at this moment), but feel free to move it, or delete it if necessary...

    Today is the third time this month that these bandwidth leechers are causing two of my sites to have bandwidth issues, I don't need no stinking Twenga boot scraping my sites for their own gain...

    TwengaBot

    What is the TwengaBot and why should I let it crawl my site?
    TwengaBot is a robot similar to GoogleBot and other automated web-crawlers.
    TwengaBot searches the web to identify online shops, and then collects all product information in order to display it, free of charge, on Twenga ...


    Google make me money, Twenga leech my bandwidth free of charge


    Why do I see the TwengaBot everyday?
    In order to best represent your shop, TwengaBot will pass by almost every day to update your product, price or inventory changes. But not to worry, our Bot has been specifically designed not to overload your bandwidth when it is crawling the site. Once on your site, TwengaBot will crawl no more than one page every few seconds. ...
    Another...

    Btw, I'm surprised that there is no search result for Twenga on ABW, because after a few searches on G., I noticed that they have been causing bandwidth problems for other people for a few years already...

    One question:

    Is the TwengaBot any better than a parasite?

  2. #2
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Nunya, Business
    Posts
    23,684
    I never heard of them but damn:

    1,012,665 users online today, 433,163,944 products, 215,253 stores.

    But Mr. Sal, look:

    "With all of this being free of charge and completely oriented around the consumer, there is everything to be gained from being crawled by TwengaBot!"

    You should feel honored, there is just everything to be gained for you, for letting them scrape your site.

  3. #3
    Roll Tide mobilebadboy's Avatar
    Join Date
    January 18th, 2005
    Location
    Mobile, Alabama
    Posts
    1,220
    One of the many, many bots blocked in my .htaccess (at least by user agent).

    Code:
    RewriteCond %{HTTP_USER_AGENT} ^$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(192.comagent|1noonbot|4arcade|80legs).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(accoona|aipbot|aihitbot|aisearchbot|alkaline|allrati|amagit|analysis|analyticsseo|apexoo|appie|almaden|askpeter).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(baidu|bdfetch|becomebot|biible|biz360|blogsearchbot|blogstreetbot|bookdog|botw|bpimagewalker).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(bravobrian|browseremulator|bruinbot|builtwith|busiverse|busybot).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(catchbot|cazoodle|ccbot|ccubee|cell-phone|charlotte|chilkat|citeseerxbot|CMS\ Survey|comodo|compatible;\ ics).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(convera|copyrightsheriff|crawllybot|csimplespider|cyberpatrol).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(datacha0s|daumoa|dblbot|dealgates|dejavu|depspid|diamondbot|discobot|dotbot).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(domaingoat|domino|dotcomhints|dragonfly|duckduck|dulance).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(easydl|ebiness|emailsiphon|emeraldshield|entireweb|envolk|equilibrium|exabot|ezooms).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(factbot|fairshare|fast|findlinks|firesignbranding|flicky|followsite|folkd|fyber).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(gaisbot|geniebot|gigabot|gigablast|gingercrawler|girafa|goforitbot|grub|guruji).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(heritrix|htmlparser|hosting-advisor|http/1.0|httrack|huawei).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(i1searchbot|ia_archiver|iaea.org|iceweasel|ichiro|influencebot|infobay|ips-agent|itrovatore|indy|ip2mapbot|irlbot|iurl).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(jadynave|jakarta|java|joedog|jyxobot).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(kalooga|kbeebot|kfsw|kilomonkey|krugle).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(lanshan|larbin|largesmall|legalanalysis|legalx|libcurl|libwww|linguee|localcom|lucene|lworldmedia|lwp-trivial).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(marketdefender|metauri|microsoft\ data\ access|microsoft\ url|milensbot|missigua|mj12bot).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(mlbot|mojeek|moreoverbot|mqbot|msnbot-media|msrbot|multicrawler|mvaclient|myfamilybot|my-robot).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(naver|netresearchserver|nexen|netcraft|nextgensearch|ng-search|nicebot|nimble|noxtrum|npbot|nutch).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(obot|omgili|omniexplorer|oncheck|outfoxbot).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(pagegetter|page_prefetcher|panscient|parchbot|phpcrawl|pickle|pingdom|plantynet|poe-component-client).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(probe|proximic|psbot|psycheclone|ptech|purebot|purity|pycurl|python-urllib).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(radian6|rampybot|rankurbot|rdfbot|relevantnoise|robotgenius).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(sbider|scej\ psp|scoutjet|scspider|scumbot|search17|sensis|seznam|sgbot|sheenbot|shim-crawler).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(shopsalad|szukacz|smagent|smhelper|snapbot|snappy|snoopy|societyrobot|sogou|speedy|soso|spbot|sproose).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(setooz|surveybot|susie|sygolbot|syndk8|synoo|syntryx|szukacz).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(tagoo|taptu|tasapspider|terrawiz|thunderstone|thumbshots|tineye|tra.cx|turnitin|twenga|twiceler).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(updated.com|vespa|visbot|voila|voyager).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(w3composition|webalt|webcapture|webcrawler|webdatacentre|websauger|website\ explorer).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(websquash|webstripper|websurfer|webvuln|wikiwix|winhttprequest|winkbot|winwebbot|wwwster).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(XLNT|xmarks).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(yandex|yanga|yebolbot|yodao|yoogli|youdao).*$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^.*(zedzo|zend_http_client|zeus).*$ [NC]
    RewriteRule ^.*$ - [F]

    Shawn Kerr (.com) | Disney World | SEC Football

  4. Thanks From:

  5. #4
    ABW Veteran Mr. Sal's Avatar
    Join Date
    January 18th, 2005
    Posts
    6,795
    Quote Originally Posted by Trust View Post
    I never heard of them but damn:

    1,012,665 users online today, 433,163,944 products, 215,253 stores.

    "With all of this being free of charge and completely oriented around the consumer, there is everything to be gained from being crawled by TwengaBot!"

    You should feel honored, there is just everything to be gained for you, for letting them scrape your site.
    Thanks for the sarcastic reply Trust, but...

    The affiliate marketing idea is no longer like that old idea from the Field of Dreams Movie... If you build it [Your Site], he will come -[Google])...

    Now it's like... If you build it [Your Site], these leeches and parasites will definitely Suck You Dry!

  6. #5
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Nunya, Business
    Posts
    23,684
    Oh, I actually agree with you. I checked a few of my sites and apparently they're not worthy, not seeing that bot. I really don't get in there and check which bots are sucking up the bandwidth like I should. Something else to put on the to do list.

    "The affiliate marketing idea is no longer like that old idea from the Field of Dreams Movie... If you build it [Your Site], he will come -[Google])... "

    That part, I don't agree with tho. It's nothing new, since the beginning. Quality. People still line up to see the Mona Lisa, nobody is checking for motel art. Obviously Mona is a bit of a stretch but you can do better than Motel 6. That was pretty corny what I just wrote.
    Last edited by Trust; July 28th, 2011 at 11:29 PM.

  7. #6
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Here's a few more to kill off:

    User-agent: robtexbot
    User-agent: pricessearchbot
    User-agent: MJ12bot
    User-agent: ZmEu
    User-agent: moget
    User-agent: ichiro
    User-agent: NaverBot
    User-agent: Yeti
    User-agent: Baiduspider
    User-agent: Baiduspider-video
    User-agent: Baiduspider-image
    User-agent: duggmirror

    The above we had put in our robots.txt files but Baidu ignores them. So we blocked them on the server level.

    The following tend to run rampant as well but we have set crawl delays:

    User-agent: ShopWiki
    Crawl-Delay: 10
    User-agent: Fatbot
    Crawl-Delay: 20
    User-agent: Gigabot
    Crawl-Delay: 10
    Salty kisses, Sandy toes, and a Pirate's heart...

  8. #7
    Newbie
    Join Date
    April 9th, 2011
    Posts
    22
    Hey fellas, what do you mean when you say that they "scrape" your site?

  9. #8
    Moderator BurgerBoy's Avatar
    Join Date
    January 18th, 2005
    Location
    jacked by sylon www.sylonddos.weebly.com
    Posts
    9,618
    They steal your site's contents to use for themselves.

    Vietnam Veteran 1966-1970 USASA
    ABW Forum Rules - Advertise At ABW

  10. #9
    Roll Tide mobilebadboy's Avatar
    Join Date
    January 18th, 2005
    Location
    Mobile, Alabama
    Posts
    1,220
    It means they take the data from your site, without permission, and use it as their own. Like me going to your site, copying everything from it and posting it on my site. But instead of doing it manually, people create software (bots) to do it for them.

    Edit: bah....

    Shawn Kerr (.com) | Disney World | SEC Football

  11. #10
    Newbie
    Join Date
    April 9th, 2011
    Posts
    22
    Umm wow that is rotten, so how do you combat that from happening? and if THE google is so harsh on duplicate content how does it distinguish between your original content and the ripped off material these bots pick up?

  12. #11
    ABW Ambassador JoyUnltd's Avatar
    Join Date
    January 19th, 2005
    Location
    Emerald City
    Posts
    2,019
    As UA strings are constantly changed, I already saw new versions of bots, so I looked around for a possible automated way to keep out bad bots/referrers & I came across this: perishablepress.com/press/2009/03/16/the-perishable-press-4g-blacklist/

    He also has a 5G beta, created in Feb. 2011: perishablepress.com/5g-firewall-beta/

    I tried both of them, now have the 5G version in my .htaccess file. It is a static site with php includes + a WP blog. Nothing seems to have broken so far. I'll check the access files next month & see if it's doing the job.
    Renée
    Pay no attention to that woman behind the curtain. -Wizardress of Oz

  13. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. This sucks!
    By Heyder in forum Commission Junction - CJ
    Replies: 17
    Last Post: May 17th, 2004, 11:20 AM
  2. This sucks....
    By rem in forum Commission Junction - CJ
    Replies: 9
    Last Post: December 18th, 2003, 10:15 AM
  3. Being #1 sucks..
    By Dynamoo in forum Virtual Family and Off-Topic
    Replies: 3
    Last Post: September 10th, 2003, 11:26 AM
  4. Sucks to be me
    By Pete in forum Search Engine Optimization
    Replies: 14
    Last Post: December 1st, 2001, 10:47 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •