Results 1 to 10 of 10
  1. #1
    ABW Veteran Mr. Sal's Avatar
    Join Date
    January 18th, 2005
    Posts
    6,795
    Thumbs down Why Are We Crawling The Web? - "The SiteSell robot"
    Why Are We Crawling The Web?

    SiteSell is gathering a statistical representation of topics presented on the Web as a whole. Each Web page visited is categorized under the topics that it represents, allowing our customers to know the percentage of Web pages that are about any particular topic.

    The actual content of all Web pages is removed from all SiteSell systems after being spidered, categorized and scored, usually within 48 hours of being visited by SBIder.

    Source: http :// www. sitesell.com/sbider.html
    So what do we need this SiteSell robot going to our sites anyway?

    I think that we should have a sticky thread somewhere, on which we can post all the worthless robots that are crawling our sites without any benefit for us.

    Unless we compile a list of worthless robots for all to see and take action, we're just giving away free bandwidth to those that benefit from our work, while we may not get anything good back in return.




    ...

  2. #2
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    Won't do us a bit of good, it's just for their benefit.

    User-agent: SBIder
    Disallow: /

    At least they identify themselves, if only there was a way to disallow all those unidentified homegrown scrapers.

  3. #3
    Full Member markschok's Avatar
    Join Date
    January 18th, 2005
    Posts
    269
    I remember Mike and Charlie posted a list of bad bots sometime ago, maybe that would be a starting point.
    The only bot/agent I have banned just now is cometsystems.

  4. #4
    Newbie
    Join Date
    November 18th, 2006
    Posts
    11
    Hi,

    But I never heard this anywhere before... How much brandwidth can they take? Don't just they visit and crawl the site?!

    Karl
    [Removed Manual Signature]
    Last edited by DesignerWiz; November 18th, 2006 at 11:56 AM. Reason: Removed Manual Signature

  5. #5
    15 years and counting
    Join Date
    January 18th, 2005
    Posts
    6,121
    Quote Originally Posted by smithkarl
    How much brandwidth can they take?
    It's just for one of my sites. I've around 100, so how much Bandwidth is it? EVERYDAY.

    34 different robots *Hits Bandwidth Last visit
    Unknown robot (identified by 'crawl') 9237 451.23 MB 17 Nov 2006 - 23:59
    MSNBot 7720 406.82 MB 17 Nov 2006 - 23:54
    Googlebot 4042 207.46 MB 17 Nov 2006 - 21:50
    DDSE robot 1706 95.27 MB 17 Nov 2006 - 12:39
    AskJeeves 1402 74.69 MB17 Nov 2006 - 22:50
    Inktomi Slurp 1036 51.76 MB 17 Nov 2006 - 23:53
    Voila 879 47.65 MB 17 Nov 2006 - 14:38
    MSIECrawler 62 77.43 MB 17 Nov 2006 - 22:44
    WISENutbot 623 34.13 MB 17 Nov 2006 - 23:59
    EchO!499 5.02 MB 17 Nov 2006 - 21:50
    Others 1644 81.99 MB 17 Nov 2006

    It means 1463.45 MB everyday for one site.

  6. #6
    15 years and counting
    Join Date
    January 18th, 2005
    Posts
    6,121
    Some more of these spiders and bots, not visiting everyday
    Voyager
    IA Archive
    spider
    LinkChecker
    robot
    GigaBot
    psbot
    Web Core
    Roots
    Grub.org
    Pioneer
    Calif
    SpiderMan
    Google AdSense (I don't have AdSense on my sites)
    Templeton
    arks
    larbin
    The Python Robot
    BaiDuSpider
    Yandex bot
    The World Wide Web Worm
    Ingrid
    LinkBot
    InfoSeek Robot
    StackRambler

  7. #7
    web dev with whiskers tn-morgen's Avatar
    Join Date
    February 15th, 2007
    Location
    USA
    Posts
    177
    SBIder
    This is building a "definitive" search of keywords on the internet to help their subscribers to build better content-oriented sites.

    Once the initial bot search is done, it probably won't hit your site more than once every 2 months. There are lots of other folks out there - even checking your WhoIS info, so this guy is fairly innocuous.

  8. #8
    Classic Rocker Mack's Avatar
    Join Date
    January 27th, 2007
    Location
    Lower Left Coast
    Posts
    1,167
    I just use a robot trap. I can choose which to allow (major engines) and block the rest, well try to anyway. It's far better than doing nothing.

    http://www.webmasterworld.com/forum13/1823.htm

    Is a good example of what to search for.

    If you have a huge mall site, you have to do things like this.

  9. #9
    Roll Tide mobilebadboy's Avatar
    Join Date
    January 18th, 2005
    Location
    Mobile, Alabama
    Posts
    1,220
    Have had that one blocked for a long time through htaccess. Forgot all about it until now.

    Along with 136 other bots.

    Shawn Kerr (.com) | Disney World | SEC Football

  10. #10
    general fuq mrbshouse's Avatar
    Join Date
    January 18th, 2005
    Location
    Argieville
    Posts
    1,381
    i went so far as to block the ips, coming from 4 or 5 blocks if i remember correctly...

    "Why are we visiting you, so we can scrape your content and resell it to people who pay us"

  11. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Anyone using "Article Marketing Robot"?
    By stuminator in forum Marketing Resources & Power Tools
    Replies: 18
    Last Post: December 13th, 2011, 03:50 PM
  2. Replies: 0
    Last Post: April 23rd, 2008, 05:51 PM
  3. Replies: 2
    Last Post: July 29th, 2005, 01:51 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •