Results 1 to 11 of 11
  1. #1
    ABW Ambassador Packy's Avatar
    Join Date
    January 18th, 2005
    Location
    Syracuse
    Posts
    4,205
    Robots.txt and MSN
    For some reason MSN isn't listing my website. It spidered it last month once, looked for the robts.txt file which I didn't have and then left without returning last month. I added a robots.txt and submitted my url again. MSN has been coming back daily, twice today so far and now it is hitting the robots.txt file and leaving. For some reason it's not crawling my site at all. Any thoughts? Below is the file I am using that I copied for Ecom Mike awhile back. Most likely I changed some of it and I'm seeing if anyone see's a problem with it or any reason MSN is not even crawling the first page. Thanks Oh and Google now seems to be just hitting the robots.txt file now and nothing else also

    <html>
    <head>
    <meta name="GENERATOR" content="Microsoft FrontPage 5.0">
    <meta name="ProgId" content="FrontPage.Editor.Document">
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
    <title>User</title>
    </head>
    <body>
    <p><span class="ev_text_normal">User-agent:*<br>
    User-agent: Mediapartners-Google* <br>
    Disallow:<br>
    Disallow:/stats/<br>
    Disallow:/_private/<br>
    Disallow:/_borders/<br>
    Disallow:/_fpclass/<br>
    Disallow:/_overlay/<br>
    Disallow:/_themes/<br>
    Disallow:/_vti_bin/<br>
    Disallow:/_vti_cnf/<br>
    Disallow:/_vti_log/<br>
    Disallow:/_vti_pvt/<br>
    Disallow:/_vti_txt/<br>
    Disallow:/club/<br>
    User-agent: TurnitinBot<br>
    Disallow: /<br>
    User-agent: grub-client<br>
    Disallow: /<br>
    <br>
    User-agent: grub<br>
    Disallow: /<br>
    <br>
    User-agent: WebZip<br>
    Disallow: /<br>
    <br>
    User-agent: larbin<br>
    Disallow: /<br>
    <br>
    User-agent: b2w/0.1<br>
    Disallow: /<br>
    <br>
    User-agent: psbot<br>
    Disallow: /<br>
    <br>
    User-agent: Python-urllib<br>
    Disallow: /<br>
    <br>
    <br>
    User-agent: URL_Spider_Pro<br>
    Disallow: /<br>
    <br>
    User-agent: CherryPicker<br>
    Disallow: /<br>
    <br>
    User-agent: EmailCollector<br>
    Disallow: /<br>
    <br>
    User-agent: EmailSiphon<br>
    Disallow: /<br>
    <br>
    User-agent: WebBandit<br>
    Disallow: /<br>
    <br>
    User-agent: EmailWolf<br>
    Disallow: /<br>
    <br>
    User-agent: ExtractorPro<br>
    Disallow: /<br>
    <br>
    User-agent: CopyRightCheck<br>
    Disallow: /<br>
    <br>
    User-agent: Crescent<br>
    Disallow: /<br>
    <br>
    User-agent: SiteSnagger<br>
    Disallow: /<br>
    <br>
    User-agent: ProWebWalker<br>
    Disallow: /<br>
    <br>
    User-agent: CheeseBot<br>
    Disallow: /<br>
    <br>
    User-agent: LNSpiderguy<br>
    Disallow: /<br>
    <br>
    User-agent: Alexibot<br>
    Disallow: /<br>
    <br>
    User-agent: Teleport<br>
    Disallow: /<br>
    <br>
    User-agent: TeleportPro<br>
    Disallow: /<br>
    <br>
    User-agent: MIIxpc<br>
    Disallow: /<br>
    <br>
    User-agent: Telesoft<br>
    Disallow: /<br>
    <br>
    User-agent: Website Quester<br>
    Disallow: /<br>
    <br>
    User-agent: moget/2.1<br>
    Disallow: /<br>
    <br>
    User-agent: WebZip/4.0<br>
    Disallow: /<br>
    <br>
    User-agent: WebStripper<br>
    Disallow: /<br>
    <br>
    User-agent: WebSauger<br>
    Disallow: /<br>
    <br>
    User-agent: WebCopier<br>
    Disallow: /<br>
    <br>
    User-agent: NetAnts<br>
    Disallow: /<br>
    <br>
    User-agent: Mister PiX<br>
    Disallow: /<br>
    <br>
    User-agent: WebAuto<br>
    Disallow: /<br>
    <br>
    User-agent: TheNomad<br>
    Disallow: /<br>
    <br>
    User-agent: WWW-Collector-E<br>
    Disallow: /<br>
    <br>
    User-agent: RMA<br>
    Disallow: /<br>
    <br>
    User-agent: libWeb/clsHTTP<br>
    Disallow: /<br>
    <br>
    User-agent: asterias<br>
    Disallow: /<br>
    <br>
    User-agent: httplib<br>
    Disallow: /<br>
    <br>
    User-agent: turingos<br>
    Disallow: /<br>
    <br>
    User-agent: spanner<br>
    Disallow: /<br>
    <br>
    User-agent: InfoNaviRobot<br>
    Disallow: /<br>
    <br>
    User-agent: Harvest/1.5<br>
    Disallow: /<br>
    <br>
    User-agent: Bullseye/1.0<br>
    Disallow: /<br>
    <br>
    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)<br>
    Disallow: /<br>
    <br>
    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0<br>
    Disallow: /<br>
    <br>
    User-agent: CherryPickerSE/1.0<br>
    Disallow: /<br>
    <br>
    User-agent: CherryPickerElite/1.0<br>
    Disallow: /<br>
    <br>
    User-agent: WebBandit/3.50<br>
    Disallow: /<br>
    <br>
    User-agent: NICErsPRO<br>
    Disallow: /<br>
    <br>
    User-agent: Microsoft URL Control - 5.01.4511<br>
    Disallow: /<br>
    <br>
    User-agent: DittoSpyder<br>
    Disallow: /<br>
    <br>
    User-agent: Foobot<br>
    Disallow: /<br>
    <br>
    User-agent: SpankBot<br>
    Disallow: /<br>
    <br>
    User-agent: BotALot<br>
    Disallow: /<br>
    <br>
    User-agent: lwp-trivial/1.34<br>
    Disallow: /<br>
    <br>
    User-agent: lwp-trivial<br>
    Disallow: /<br>
    <br>
    User-agent: BunnySlippers<br>
    Disallow: /<br>
    <br>
    User-agent: Microsoft URL Control - 6.00.8169<br>
    Disallow: /<br>
    <br>
    User-agent: URLy Warning<br>
    Disallow: /<br>
    <br>
    User-agent: Wget/1.6<br>
    Disallow: /<br>
    <br>
    User-agent: Wget/1.5.3<br>
    Disallow: /<br>
    <br>
    User-agent: Wget<br>
    Disallow: /<br>
    <br>
    User-agent: LinkWalker<br>
    Disallow: /<br>
    <br>
    User-agent: cosmos<br>
    Disallow: /<br>
    <br>
    User-agent: moget<br>
    Disallow: /<br>
    <br>
    User-agent: hloader<br>
    Disallow: /<br>
    <br>
    User-agent: humanlinks<br>
    Disallow: /<br>
    <br>
    User-agent: LinkextractorPro<br>
    Disallow: /<br>
    <br>
    User-agent: Offline Explorer<br>
    Disallow: /<br>
    <br>
    User-agent: Mata Hari<br>
    Disallow: /<br>
    <br>
    User-agent: LexiBot<br>
    Disallow: /<br>
    <br>
    User-agent: Web Image Collector<br>
    Disallow: /<br>
    <br>
    User-agent: The Intraformant<br>
    Disallow: /<br>
    <br>
    User-agent: True_Robot/1.0<br>
    Disallow: /<br>
    <br>
    User-agent: True_Robot<br>
    Disallow: /<br>
    <br>
    User-agent: BlowFish/1.0<br>
    Disallow: /<br>
    <br>
    User-agent: JennyBot<br>
    Disallow: /<br>
    <br>
    User-agent: MIIxpc/4.2<br>
    Disallow: /<br>
    <br>
    User-agent: BuiltBotTough<br>
    Disallow: /<br>
    <br>
    User-agent: ProPowerBot/2.14<br>
    Disallow: /<br>
    <br>
    User-agent: BackDoorBot/1.0<br>
    Disallow: /<br>
    <br>
    User-agent: toCrawl/UrlDispatcher<br>
    Disallow: /<br>
    <br>
    User-agent: WebEnhancer<br>
    Disallow: /<br>
    <br>
    User-agent: suzuran<br>
    Disallow: /<br>
    <br>
    User-agent: VCI WebViewer VCI WebViewer Win32<br>
    Disallow: /<br>
    <br>
    User-agent: VCI<br>
    Disallow: /<br>
    <br>
    User-agent: Szukacz/1.4 <br>
    Disallow: /<br>
    <br>
    User-agent: QueryN Metasearch<br>
    Disallow: /<br>
    <br>
    User-agent: Openfind data gathere<br>
    Disallow: /<br>
    <br>
    User-agent: Openfind <br>
    Disallow: /<br>
    <br>
    User-agent: Xenu's Link Sleuth 1.1c<br>
    Disallow: /<br>
    <br>
    User-agent: Xenu's<br>
    Disallow: /<br>
    <br>
    User-agent: Zeus<br>
    Disallow: /<br>
    <br>
    User-agent: RepoMonkey Bait &amp; Tackle/v1.01<br>
    Disallow: /<br>
    <br>
    User-agent: RepoMonkey<br>
    Disallow: /<br>
    <br>
    User-agent: Microsoft URL Control<br>
    Disallow: /<br>
    <br>
    User-agent: Openbot<br>
    Disallow: /<br>
    <br>
    User-agent: URL Control<br>
    Disallow: /<br>
    <br>
    User-agent: Zeus Link Scout<br>
    Disallow: /<br>
    <br>
    User-agent: Zeus 32297 Webster Pro V2.9 Win32<br>
    Disallow: /<br>
    <br>
    User-agent: Webster Pro<br>
    Disallow: /<br>
    <br>
    User-agent: EroCrawler<br>
    Disallow: /<br>
    <br>
    User-agent: LinkScan/8.1a Unix<br>
    Disallow: /<br>
    <br>
    User-agent: Keyword Density/0.9<br>
    Disallow: /<br>
    <br>
    User-agent: Kenjin Spider<br>
    Disallow: /<br>
    <br>
    User-agent: Iron33/1.0.2<br>
    Disallow: /<br>
    <br>
    User-agent: Bookmark search tool<br>
    Disallow: /<br>
    <br>
    User-agent: GetRight/4.2<br>
    Disallow: /<br>
    <br>
    User-agent: FairAd Client<br>
    Disallow: /<br>
    <br>
    User-agent: Gaisbot<br>
    Disallow: /<br>
    <br>
    User-agent: Aqua_Products<br>
    Disallow: /<br>
    <br>
    User-agent: Radiation Retriever 1.1<br>
    Disallow: /<br>
    <br>
    User-agent: WebmasterWorld Extractor<br>
    Disallow: /<br>
    <br>
    User-agent: Flaming AttackBot<br>
    Disallow: /<br>
    <br>
    User-agent: Oracle Ultra Search<br>
    Disallow: /<br>
    <br>
    User-agent: PerMan<br>
    Disallow: /<br>
    <br>
    User-agent: searchpreview<br>
    Disallow: /</span></p>

    </body>

    </html>

  2. #2
    ABW Ambassador Packy's Avatar
    Join Date
    January 18th, 2005
    Location
    Syracuse
    Posts
    4,205
    One other question! Would this work as a robots.txt file? Just having the 3 lines of code>

    # Allow all
    User-agent: *
    Disallow:

    Also I'm assuming I shouldn't have the code below in the first file?

    <html>
    <head>
    <meta name="GENERATOR" content="Microsoft FrontPage 5.0">
    <meta name="ProgId" content="FrontPage.Editor.Document">
    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
    <title>User</title>
    </head>
    <body>


    Thanks!

  3. #3
    Analytics Dude Kevin's Avatar
    Join Date
    January 18th, 2005
    Location
    Rochester, NY
    Posts
    5,904
    User-agent: Microsoft URL Control - 5.01.4511<br>
    Disallow: /<br>
    I'm not familiar with these... What are they? (Not saying it's the problem...)
    Kevin Webster
    twitter: levelanalytics

    Kayak Fishing
    Web Analytics and Affiliate Marketing

  4. #4
    The slot machine that IS paid! Billy Kay's Avatar
    Join Date
    January 18th, 2005
    Location
    Small Town in Tennessee
    Posts
    5,226
    Packy

    A robot file is a TEXT file

    get rid of all the html markup stuff

  5. #5
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    A robot file is a TEXT file

    get rid of all the html markup stuff
    On an HTML page is the wrong place for all those. They all belong in a file called robots.txt that goes into the root directory same as the main index page of the site. No markup at all goes on it, it's just a simple text file.

    http://www.robotstxt.org/wc/exclusion-admin.html

  6. #6
    ABW Ambassador Packy's Avatar
    Join Date
    January 18th, 2005
    Location
    Syracuse
    Posts
    4,205
    Thanks Billy Kay and Webworker.

    I think where I messed up with that was I opened a txt file from another site of mine and just coppied it and pasted it into a frontpage template I fixed the problem and put it into a notepad txt file. I best go check some of my other sites to see if I did the same thing. Thanks for the link also Webworker

    Noth, I haven't a clue what

    User-agent: Microsoft URL Control - 5.01.4511<br>
    Disallow: /<br>

    And some of the others are. I copied the code here from Ecom Mike. If I were to guess I would say it's some kind of Bad bot or hijacker or something but I am just guessing. I'm sure Mike or someone else knows. Thanks again All

  7. #7
    ABW Ambassador
    Join Date
    November 26th, 2005
    Posts
    560
    Yes, those html tags must be messing up. From what I've seen, MSN is the easiest to get your rankings for and it spiders easily.
    The Best Forums
    Nothing in the world can take the place of persistance
    . Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistance and determination are omnipotent.
    Abestweb Store

  8. #8
    ABW Ambassador
    Join Date
    November 26th, 2005
    Posts
    560
    Also, are you trying to block some bots? If not, why block any bots unless your site is huge and the bots are eating a lot of bandwidth?
    Just asking - not saying that you shouldn't block bots in first place.
    The Best Forums
    Nothing in the world can take the place of persistance
    . Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistance and determination are omnipotent.
    Abestweb Store

  9. #9
    Analytics Dude Kevin's Avatar
    Join Date
    January 18th, 2005
    Location
    Rochester, NY
    Posts
    5,904
    Newaff. Some bots are like some people... you just don't want them hanging around. They take without asking, and then claim all your best ideas as their own...
    Kevin Webster
    twitter: levelanalytics

    Kayak Fishing
    Web Analytics and Affiliate Marketing

  10. #10
    ABW Ambassador Packy's Avatar
    Join Date
    January 18th, 2005
    Location
    Syracuse
    Posts
    4,205
    Quote Originally Posted by newaff
    Yes, those html tags must be messing up. From what I've seen, MSN is the easiest to get your rankings for and it spiders easily.
    Well MSN did crawl all the pages after the change, but no listing yet. What has me confused about MSN lately is how they use to crawl all sites with or without a robots.txt file but lately they have hit a couple of my sites just looking for the file and leaving if they can't find one

    As for the blocking of some of the robots, I believe alot of them are scum Bots just scraping or something. Again the more knowledgable would know more than me about that. I am testing a robots.txt file without blocking anything to see how that goes

  11. #11
    Full Member Tech Evangelist's Avatar
    Join Date
    March 16th, 2005
    Location
    Mesa, AZ
    Posts
    374
    The robots.txt file will never block scum bots or scrapers. The use of the robots.txt file is completely voluntary by the bots. I don't think there is a bad guy out there that pays any attention to it. Most of the junk in the file you posted is worthless.

    Webworker is right. Your HTML markup was clearly part of the problem. There is no HTML markup in a robots.txt file. Also, make sure that you use Notepad or another pure text editor. Invisible headers and other info inserted by Word and some code editors can mess it up. It must be a pure text file.
    Last edited by Tech Evangelist; February 12th, 2006 at 03:16 PM.
    There's good, fast and cheap. Pick any two.
    [url=http://www.topranksolutions.com]Phoenix SEO[/url] :: [url=http://www.tech-evangelist.com/category/affiliate-marketing/]Affiliate Marketing Tutorials[/url]

  12. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Restricted by robots.txt without robots.txt?
    By mayfly in forum Search Engine Optimization
    Replies: 10
    Last Post: August 26th, 2009, 05:13 PM
  2. Robots.txt
    By Rhia7 in forum Midnight Cafe'
    Replies: 0
    Last Post: April 18th, 2009, 12:34 AM
  3. Config Robots.txt ?.....
    By Steve Williams in forum Search Engine Optimization
    Replies: 67
    Last Post: September 8th, 2008, 12:24 PM
  4. Do you use a robots.txt?
    By Mr. Sal in forum Voting Booth
    Replies: 11
    Last Post: November 12th, 2003, 07:29 PM
  5. robots txt
    By reflections in forum Programming / Datafeeds / Tools
    Replies: 5
    Last Post: December 26th, 2002, 06:22 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •