Results 1 to 12 of 12
  1. #1
    What's the word? Rhia7's Avatar
    Join Date
    January 13th, 2006
    Posts
    9,578
    Would it be worth it to ban the image bin?
    http://images.google.com/ is driving me googly.

    What would I need to put into my meta sections of my pages to ban google images spiders/bots/bin dwellers?

    Or would it be best to just stay with the status quo?
    ~Rhia7 -- Remember the 7
    Twitter me

  2. #2
    notary sojac Herb ԿԬ's Avatar
    Join Date
    January 18th, 2005
    Location
    Central/Western NY State
    Posts
    7,741
    very good question. one of my sites gets 270-350 image search hits a day. I've been looking at the specific search terms and origins and I'm still wondering what to do. and the images are not on my sites, just appearing in PSC code from a merchant.

    since it is possible to see some of the searches cause more than one hit at the time from a visitor it may be these could result in a sale so I'm holding off doing anything for now.

    I'm more nervous about ask.com's spider hits causing me a usage expense on a server. They put a real spike lasting several days in the stats when I upload a certain very large site update.
    Last edited by Herb ԿԬ; June 27th, 2006 at 11:31 AM. Reason: added copy

  3. #3
    What's the word? Rhia7's Avatar
    Join Date
    January 13th, 2006
    Posts
    9,578
    Quote Originally Posted by Herb ԿԬ
    very good question. one of my sites gets 270-350 image search hits a day. I've been looking at the specific search terms and origins and I'm still wondering what to do. and the images are not on my sites, just appearing in PSC code from a merchant.
    Google Images search results on my sites can be divided into 75% of the images are merchant product images and 25% of actually gifs, jpegs, photos from my sites.

    One the one hand, I'd like to block the Google Images spiders/bots but I don't want to block Google. On the other hand, there is a possiblity that Google Images result in a sale (although the likelihood is slim, I know of a few sales that resulted from Google Images searches).

    So should I block spiders in the meta? How does one block the Google Images spiders/bots but not Google?
    ~Rhia7 -- Remember the 7
    Twitter me

  4. #4
    ABW Ambassador ToughTurkey's Avatar
    Join Date
    January 18th, 2005
    Posts
    993
    You don't do it through the meta tags, you do it through your robots.txt file

    Create a simple text file in notepad or wordpad with the following text:

    User-agent: Googlebot-Image
    Disallow: /

    Then save it (as robots.txt) and upload it to your root directory. This does not disallow googlebot, only googlebot-image.

    IMHO, i don't think many potential customers do image searches. I think a lot are webmasters looking for images to use on their own websites.

  5. #5
    ABW Ambassador ToughTurkey's Avatar
    Join Date
    January 18th, 2005
    Posts
    993
    Here is a great robots.txt that was posted at WMW that I have used for a long time now. It bans most of the scraper bots and other malicious bots as well as googlebot-image.

    Feel free to cut and paste as long as you abide by its header instructions...

    (If I where you, I'd spend some time educating myself about exactly what this code does before I'd use it, but I've used it and rank great on the big 3 SE's.)


    #
    # WebmasterWorld.com: robots.txt
    # GNU Robots.txt Feel free to use with credit
    # given to WebmasterWorld.
    #
    # Please, we do NOT allow nonauthorized robots any longer.
    # Yes, feel free to copy and use the following.

    User-agent: Googlebot-Image
    Disallow: /

    User-agent: NetMechanic
    Disallow: /

    User-agent: URL_Spider_Pro
    Disallow: /

    User-agent: CherryPicker
    Disallow: /

    User-agent: EmailCollector
    Disallow: /

    User-agent: EmailSiphon
    Disallow: /

    User-agent: WebBandit
    Disallow: /

    User-agent: EmailWolf
    Disallow: /

    User-agent: ExtractorPro
    Disallow: /

    User-agent: CopyRightCheck
    Disallow: /

    User-agent: Crescent
    Disallow: /

    User-agent: SiteSnagger
    Disallow: /

    User-agent: ProWebWalker
    Disallow: /

    User-agent: CheeseBot
    Disallow: /

    User-agent: LNSpiderguy
    Disallow: /

    User-agent: WebZip/4.0
    Disallow: /

    User-agent: WebStripper
    Disallow: /

    User-agent: WebSauger
    Disallow: /

    User-agent: WebCopier
    Disallow: /

    User-agent: NetAnts
    Disallow: /

    User-agent: Mister PiX
    Disallow: /

    User-agent: WebAuto
    Disallow: /

    User-agent: TheNomad
    Disallow: /

    User-agent: WWW-Collector-E
    Disallow: /

    User-agent: Harvest/1.5
    Disallow: /

    User-agent: Bullseye/1.0
    Disallow: /

    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /

    User-agent: CherryPickerSE/1.0
    Disallow: /

    User-agent: CherryPickerElite/1.0
    Disallow: /

    User-agent: WebBandit/3.50
    Disallow: /

    User-agent: NICErsPRO
    Disallow: /

    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /

    User-agent: DittoSpyder
    Disallow: /

    User-agent: Foobot
    Disallow: /

    User-agent: SpankBot
    Disallow: /

    User-agent: BotALot
    Disallow: /

    User-agent: lwp-trivial/1.34
    Disallow: /

    User-agent: lwp-trivial
    Disallow: /

    User-agent: BunnySlippers
    Disallow: /

    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /

    User-agent: URLy Warning
    Disallow: /

    User-agent: Wget/1.6
    Disallow: /

    User-agent: Wget/1.5.3
    Disallow: /

    User-agent: Wget
    Disallow: /

    User-agent: LinkWalker
    Disallow: /

    User-agent: cosmos
    Disallow: /

    User-agent: moget
    Disallow: /

    User-agent: hloader
    Disallow: /

    User-agent: humanlinks
    Disallow: /

    User-agent: LinkextractorPro
    Disallow: /

    User-agent: Offline Explorer
    Disallow: /

    User-agent: Mata Hari
    Disallow: /

    User-agent: LexiBot
    Disallow: /

    User-agent: Web Image Collector
    Disallow: /

    User-agent: The Intraformant
    Disallow: /

    User-agent: True_Robot/1.0
    Disallow: /

    User-agent: True_Robot
    Disallow: /

    User-agent: BlowFish/1.0
    Disallow: /

    User-agent: http://www.SearchEngineWorld.com bot
    Disallow: /

    User-agent: JennyBot
    Disallow: /

    User-agent: MIIxpc/4.2
    Disallow: /

    User-agent: BuiltBotTough
    Disallow: /

    User-agent: ProPowerBot/2.14
    Disallow: /

    User-agent: BackDoorBot/1.0
    Disallow: /

    User-agent: toCrawl/UrlDispatcher
    Disallow: /

    User-agent: WebEnhancer
    Disallow: /

    User-agent: suzuran
    Disallow: /

    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /

    User-agent: VCI
    Disallow: /

    User-agent: Szukacz/1.4
    Disallow: /

    User-agent: QueryN Metasearch
    Disallow: /

    User-agent: Openfind data gathere
    Disallow: /

    User-agent: Openfind
    Disallow: /

    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /

    User-agent: Xenu's
    Disallow: /

    User-agent: Zeus
    Disallow: /

    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /

    User-agent: RepoMonkey
    Disallow: /

    User-agent: Microsoft URL Control
    Disallow: /

    User-agent: Openbot
    Disallow: /

    User-agent: URL Control
    Disallow: /

    User-agent: Zeus Link Scout
    Disallow: /

    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /

    User-agent: Webster Pro
    Disallow: /

    User-agent: EroCrawler
    Disallow: /

    User-agent: LinkScan/8.1a Unix
    Disallow: /

    User-agent: Keyword Density/0.9
    Disallow: /

    User-agent: Kenjin Spider
    Disallow: /

    User-agent: Iron33/1.0.2
    Disallow: /

    User-agent: Bookmark search tool
    Disallow: /

    User-agent: GetRight/4.2
    Disallow: /

    User-agent: FairAd Client
    Disallow: /

    User-agent: Gaisbot
    Disallow: /

    User-agent: Aqua_Products
    Disallow: /

    User-agent: Radiation Retriever 1.1
    Disallow: /

    User-agent: Flaming AttackBot
    Disallow: /

    User-agent: Oracle Ultra Search
    Disallow: /

    User-agent: MSIECrawler
    Disallow: /

    User-agent: PerMan
    Disallow: /

    User-agent: searchpreview
    Disallow: /

    User-agent: sootle
    Disallow: /

    User-agent: es
    Disallow: /

    User-agent: Enterprise_Search/1.0
    Disallow: /

    User-agent: Enterprise_Search
    Disallow: /

    User-agent: MSRBOT
    Disallow: /

    User-agent: *
    Disallow: /cgi-bin/

  6. #6
    What's the word? Rhia7's Avatar
    Join Date
    January 13th, 2006
    Posts
    9,578
    Quote Originally Posted by ToughTurkey
    You don't do it through the meta tags, you do it through your robots.txt file

    Create a simple text file in notepad or wordpad with the following text:

    User-agent: Googlebot-Image
    Disallow: /

    Then save it (as robots.txt) and upload it to your root directory. This does not disallow googlebot, only googlebot-image.
    So I wouldn't attach the instructions to an .html (web page) file, I would instead save the directions as a robots.txt and then upload it into my main directory?
    Is the root directory, the main directory? (Where I upload everything?)

    Thanks for the help!
    ~Rhia7 -- Remember the 7
    Twitter me

  7. #7
    ABW Ambassador ToughTurkey's Avatar
    Join Date
    January 18th, 2005
    Posts
    993
    Quote Originally Posted by Rhia7
    So I wouldn't attach the instructions to an .html (web page) file, I would instead save the directions as a robots.txt and then upload it into my main directory?
    Is the root directory, the main directory? (Where I upload everything?)

    Thanks for the help!
    Yes, and yes.

  8. #8
    notary sojac Herb ԿԬ's Avatar
    Join Date
    January 18th, 2005
    Location
    Central/Western NY State
    Posts
    7,741
    interesting list, there. wonder how old it is as I see a lot of new nosey bots coming down the pike over the past year.

  9. #9
    notary sojac Herb ԿԬ's Avatar
    Join Date
    January 18th, 2005
    Location
    Central/Western NY State
    Posts
    7,741
    speaking of which: I don't mind ask.com hitting my sites, but do they have to register on the counter of the hosting company? they're really inflating my bandwidth count.

  10. #10
    Comfortably Numb John Powell's Avatar
    Join Date
    October 17th, 2005
    Location
    Bayou Country, LA
    Posts
    3,432
    Here is a great robots.txt that was posted at WMW that I have used for a long time now.
    A lot has changed over at WMW since this post. Take a look at their robots.txt now: http://www.webmasterworld.com/robots.txt


  11. #11
    notary sojac Herb ԿԬ's Avatar
    Join Date
    January 18th, 2005
    Location
    Central/Western NY State
    Posts
    7,741
    Lightbulb
    -->all; disallow root?

    ~OK, found the reference at the beginning, but can't run it on my sites, I don't think . . .

    all they want to see are
    if ($agent =~ /slurp/gi || $agent =~ /msnbot/gi || $agent =~ /Jeeves/gi || $agent =~ /googlebot/gi) {

    and if you don't have the right stuff in your cgi bin, oh well . . .

    ----------
    so, does anyone have a more modern version of the one first posted?

  12. #12
    What's the word? Rhia7's Avatar
    Join Date
    January 13th, 2006
    Posts
    9,578
    Quote Originally Posted by Herb ԿԬ
    -
    so, does anyone have a more modern version of the one first posted?

    A modern one would help -- or sorm type of [i.e. online] web form/entry -- I'm a little confused

    Thanks to everyone for the help.
    ~Rhia7 -- Remember the 7
    Twitter me

  13. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Altavista, to ban or not to ban?
    By Remy in forum Search Engine Optimization
    Replies: 8
    Last Post: November 1st, 2003, 06:40 AM
  2. How to fix an image with a '\' in image url
    By ahugedeal in forum Programming / Datafeeds / Tools
    Replies: 9
    Last Post: September 24th, 2003, 06:29 AM
  3. Osama bin Laden
    By SSanf in forum Virtual Family and Off-Topic
    Replies: 7
    Last Post: January 13th, 2002, 11:06 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •