Results 1 to 11 of 11
  1. #1
    Full Member
    Join Date
    January 18th, 2005
    Posts
    235
    Hey all,
    I originally posted this in http://abw.infopop.cc/eve/ubb.x?a=tp...89&m=574103205
    but it seems like thats a dead post. So I am starting a new thread.

    I need some help. First thank you for this post, I am new to AM and did not know there were bad bots. I have created a robots.tx file ( copied from this post)thanks again. I have uploaded it to my servers.

    My question is my most visited page is http://xxxxxxx/_vti_bin/_vti_aut/author.exe

    which I thought would be blocked from the bots based on the code I pasted in.

    My third most is http://xxxxxxx/lines.swf

    which I also thought would be blocked.

    Did I do something incorrectly in setting up my robots file and more importantly is this having an affect on my being listed into SE.

    Thank you for all you do.

    Spider Man

  2. #2
    Resident Genius and Staunch Capitalist Leader's Avatar
    Join Date
    January 18th, 2005
    Location
    Florida
    Posts
    12,817
    "Did I do something incorrectly in setting up my robots file"

    Without seeing it, it's hard to say.

    But it could be that you are getting visits from "rogue spiders"--that is, those that ignore robots.txt files.

    "and more importantly is this having an affect on my being listed into SE."

    Any SEs whose bots are blocked won't list you. But there *shouldn't* be a problem with the unblocked ones, provided the robots.txt is configured right...
    There is no knowledge that is not power. ~Hemingway

  3. #3
    Full Member
    Join Date
    January 18th, 2005
    Posts
    235
    Leader,

    Thanks for posting, here is what I used to create the file. It was posted on this board in the post I refered to in this one. I made a few changes based on that post.

    User-agent:*
    User-agent: Mediapartners-Google*
    Disallow:
    Disallow:/stats/
    Disallow:/_private/
    Disallow:/_borders/
    Disallow:/_fpclass/
    Disallow:/_overlay/
    Disallow:/_themes/
    Disallow:/_vti_bin/
    Disallow:/_vti_cnf/
    Disallow:/_vti_log/
    Disallow:/_vti_pvt/
    Disallow:/_vti_txt/
    Disallow:/images/
    Disallow:/club/
    User-agent: TurnitinBot
    Disallow: /
    User-agent: grub-client
    Disallow: /

    User-agent: grub
    Disallow: /

    User-agent: looksmart
    Disallow: /

    User-agent: WebZip
    Disallow: /

    User-agent: larbin
    Disallow: /

    User-agent: b2w/0.1
    Disallow: /

    User-agent: psbot
    Disallow: /

    User-agent: Python-urllib
    Disallow: /


    User-agent: URL_Spider_Pro
    Disallow: /

    User-agent: CherryPicker
    Disallow: /

    User-agent: EmailCollector
    Disallow: /

    User-agent: EmailSiphon
    Disallow: /

    User-agent: WebBandit
    Disallow: /

    User-agent: EmailWolf
    Disallow: /

    User-agent: ExtractorPro
    Disallow: /

    User-agent: CopyRightCheck
    Disallow: /

    User-agent: Crescent
    Disallow: /

    User-agent: SiteSnagger
    Disallow: /

    User-agent: ProWebWalker
    Disallow: /

    User-agent: CheeseBot
    Disallow: /

    User-agent: LNSpiderguy
    Disallow: /

    User-agent: ia_archiver
    Disallow: /

    User-agent: ia_archiver/1.6
    Disallow: /

    User-agent: Alexibot
    Disallow: /

    User-agent: Teleport
    Disallow: /

    User-agent: TeleportPro
    Disallow: /

    User-agent: MIIxpc
    Disallow: /

    User-agent: Telesoft
    Disallow: /

    User-agent: Website Quester
    Disallow: /

    User-agent: moget/2.1
    Disallow: /

    User-agent: WebZip/4.0
    Disallow: /

    User-agent: WebStripper
    Disallow: /

    User-agent: WebSauger
    Disallow: /

    User-agent: WebCopier
    Disallow: /

    User-agent: NetAnts
    Disallow: /

    User-agent: Mister PiX
    Disallow: /

    User-agent: WebAuto
    Disallow: /

    User-agent: TheNomad
    Disallow: /

    User-agent: WWW-Collector-E
    Disallow: /

    User-agent: RMA
    Disallow: /

    User-agent: libWeb/clsHTTP
    Disallow: /

    User-agent: asterias
    Disallow: /

    User-agent: httplib
    Disallow: /

    User-agent: turingos
    Disallow: /

    User-agent: spanner
    Disallow: /

    User-agent: InfoNaviRobot
    Disallow: /

    User-agent: Harvest/1.5
    Disallow: /

    User-agent: Bullseye/1.0
    Disallow: /

    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow: /

    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /

    User-agent: CherryPickerSE/1.0
    Disallow: /

    User-agent: CherryPickerElite/1.0
    Disallow: /

    User-agent: WebBandit/3.50
    Disallow: /

    User-agent: NICErsPRO
    Disallow: /

    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /

    User-agent: DittoSpyder
    Disallow: /

    User-agent: Foobot
    Disallow: /

    User-agent: SpankBot
    Disallow: /

    User-agent: BotALot
    Disallow: /

    User-agent: lwp-trivial/1.34
    Disallow: /

    User-agent: lwp-trivial
    Disallow: /

    User-agent: BunnySlippers
    Disallow: /

    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /

    User-agent: URLy Warning
    Disallow: /

    User-agent: Wget/1.6
    Disallow: /

    User-agent: Wget/1.5.3
    Disallow: /

    User-agent: Wget
    Disallow: /

    User-agent: LinkWalker
    Disallow: /

    User-agent: cosmos
    Disallow: /

    User-agent: moget
    Disallow: /

    User-agent: hloader
    Disallow: /

    User-agent: humanlinks
    Disallow: /

    User-agent: LinkextractorPro
    Disallow: /

    User-agent: Offline Explorer
    Disallow: /

    User-agent: Mata Hari
    Disallow: /

    User-agent: LexiBot
    Disallow: /

    User-agent: Web Image Collector
    Disallow: /

    User-agent: The Intraformant
    Disallow: /

    User-agent: True_Robot/1.0
    Disallow: /

    User-agent: True_Robot
    Disallow: /

    User-agent: BlowFish/1.0
    Disallow: /

    User-agent: JennyBot
    Disallow: /

    User-agent: MIIxpc/4.2
    Disallow: /

    User-agent: BuiltBotTough
    Disallow: /

    User-agent: ProPowerBot/2.14
    Disallow: /

    User-agent: BackDoorBot/1.0
    Disallow: /

    User-agent: toCrawl/UrlDispatcher
    Disallow: /

    User-agent: WebEnhancer
    Disallow: /

    User-agent: suzuran
    Disallow: /

    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /

    User-agent: VCI
    Disallow: /

    User-agent: Szukacz/1.4
    Disallow: /

    User-agent: QueryN Metasearch
    Disallow: /

    User-agent: Openfind data gathere
    Disallow: /

    User-agent: Openfind
    Disallow: /

    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /

    User-agent: Xenu's
    Disallow: /

    User-agent: Zeus
    Disallow: /

    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /

    User-agent: RepoMonkey
    Disallow: /

    User-agent: Microsoft URL Control
    Disallow: /

    User-agent: Openbot
    Disallow: /

    User-agent: URL Control
    Disallow: /

    User-agent: Zeus Link Scout
    Disallow: /

    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /

    User-agent: Webster Pro
    Disallow: /

    User-agent: EroCrawler
    Disallow: /

    User-agent: LinkScan/8.1a Unix
    Disallow: /

    User-agent: Keyword Density/0.9
    Disallow: /

    User-agent: Kenjin Spider
    Disallow: /

    User-agent: Iron33/1.0.2
    Disallow: /

    User-agent: Bookmark search tool
    Disallow: /

    User-agent: GetRight/4.2
    Disallow: /

    User-agent: FairAd Client
    Disallow: /

    User-agent: Gaisbot
    Disallow: /

    User-agent: Aqua_Products
    Disallow: /

    User-agent: Radiation Retriever 1.1
    Disallow: /

    User-agent: WebmasterWorld Extractor
    Disallow: /

    User-agent: Flaming AttackBot
    Disallow: /

    User-agent: Oracle Ultra Search
    Disallow: /

    User-agent: PerMan
    Disallow: /

    User-agent: searchpreview
    Disallow: /

  4. #4
    Resident Genius and Staunch Capitalist Leader's Avatar
    Join Date
    January 18th, 2005
    Location
    Florida
    Posts
    12,817
    I don't see any glaring faults, but there may be something that I'm missing.

    You might want to take a look at this site:

    http://www.robotstxt.org/wc/exclusion-admin.html

    It gives a plain-English description of the proper way to do a robots.txt. I don't see anything that doesn't match up, but just in case I'm missing something, it'd be good to check there.

    You probably would have gotten some more responses to this thread, but the "new posts" is broken again so it was probably missed by a lot of people. When that feature is re-fixed, give this thread a bump if you still need help and maybe some other people will see it...
    There is no knowledge that is not power. ~Hemingway

  5. #5
    Full Member
    Join Date
    January 18th, 2005
    Posts
    235
    Thanks I will look. Does it matter in what order the file is placed on my server? Right now it is in the last position becuase it is the newest i've created. Does that mean the spider looks at all the other ones first and then get to this one and says oops I should not have looked?

    Leader gald your around and looking at post since new post is not working.

    Spider Man

  6. #6
    Resident Genius and Staunch Capitalist Leader's Avatar
    Join Date
    January 18th, 2005
    Location
    Florida
    Posts
    12,817
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR> Does it matter in what order the file is placed on my server? Right now it is in the last position becuase it is the newest i've created. Does that mean the spider looks at all the other ones first <HR></BLOCKQUOTE>

    A properly-written spider will call specifically for "robots.txt" before doing anything else on your site, so it shouldn't matter how old that file is.

    I've spotted less well-written spiders (but that still obey robots.txt) doing things like checking a couple of pages *and then* calling for the robots.txt. But I don't think that's got anything to do with the age of the files either--instead, it seems to be a problem with how the bots are written.

    In any case, they're not spidering their way down to the robots.txt file. Non-rogue spiders will call for it by name.
    There is no knowledge that is not power. ~Hemingway

  7. #7
    Full Member
    Join Date
    January 18th, 2005
    Posts
    480
    I do believe your first 3 lines are strange
    You should not have the second line as it
    is redundant with the first line

    your third line might actually block everything from all bots...

    but, i am not an expert. I just think you ought to be very careful with that * character and the Disallow without anything following it.

  8. #8
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    Witness is right. The first three lines of your robots.txt are messed up.

    User-agent:*
    User-agent: Mediapartners-Google*
    Disallow:

    should be

    User-agent: Mediapartners-Google*
    Disallow:
    User-agent: *

    There should be a space between the colon and the star as well.

    Also, you haven't got a disallow on your .swf file so it will still be spidered, although as it's Flash, most spiders will just ignore it anyway.

    If you really want to be sure, you can have this as the first four lines:

    User-agent: Mediapartners-Google*
    Disallow:
    User-agent: *
    Disallow: lines.swf

  9. #9
    The affiliate formerly known as ojmoo
    Join Date
    January 18th, 2005
    Posts
    1,466
    Maybe I'm reading this wrong, but isn't mediapartners-google the bot google uses for adsense. If you don't have an adsense ad on that page it won't visit that url. It's googlebot that you want to keep out.

    Mike
    Cow Dance

  10. #10
    Full Member
    Join Date
    January 18th, 2005
    Posts
    235
    Hey everyone thanks for looking and posting. I am making changes but the spiders that are visiting are ignoring the information anyway. Not sure that it is a big deal.

    Spider Man

  11. #11
    I like traffic lights
    Join Date
    January 18th, 2005
    Location
    Southern hemisphere - away from Fukushima
    Posts
    2,936
    Most of those bots listed probably ignore robots.txt.

    You want to be 403 blocking them in your server config.

    And block mediapartners-google is not a good idea (as mentioned above) if you want have relevant adverts appears in your adsense adverts.

  12. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Who is visiting my forum?
    By SandraR in forum Midnight Cafe'
    Replies: 5
    Last Post: August 25th, 2004, 08:23 AM
  2. Replies: 1
    Last Post: August 13th, 2004, 01:55 PM
  3. Who at CJ is visiting my web site?
    By Ron Bechdolt in forum Commission Junction - CJ
    Replies: 10
    Last Post: August 21st, 2002, 01:07 PM
  4. Hi,welcome to my website for visiting
    By simpaypal in forum Domains & Hosting
    Replies: 1
    Last Post: January 21st, 2002, 12:19 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •