View Poll Results: Do you use a robots.txt?

Voters
19. You may not vote on this poll
  • Yes

    14 73.68%
  • No

    5 26.32%
  • I don't know

    0 0%
Results 1 to 12 of 12
  1. #1
    ABW Veteran Mr. Sal's Avatar
    Join Date
    January 18th, 2005
    Posts
    6,795
    I do use a robots.txt, the reason I ask is because I was thinking of giving AdSense a try, but a friend just send me this today:

    "If you have a robots.txt file, you'll need to remove it or add the following two lines to your robots.txt to allow our content bot to crawl your site"

    User-agent: Mediapartners-Google*
    Disallow:
    ------------------
    My only concern is this part "you'll need to remove it", because I think many new web site owners may not know the benefits of a robots.txt

  2. #2
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,663
    I use a robots.txt file and googlebot/media partners bot crawl all over my sites in all directories that are not banned to all. I do not add any lines as you describe. I only use the robots.txt to block all spiders from private directories and to specifically block annoying bots from all directories.

    I suspect you are safe in not adding the text lines unless you block all bots from directories you wish spidered by google. Even in that case, I do not know that the text will allow Google in.

    Before touching an otherwise perfectly good robots.txt file, I'd suggest putting Adsense on your pages, open the pages online, (don't click on the ads!) and then check your log files. Google's Media Partner bot shows up on new pages almost instantly and you'll see it in the log.
    Wayne

  3. #3
    Full Member
    Join Date
    January 18th, 2005
    Posts
    202
    For the newbies, I think you need to clarify your post some.

    A robots.txt file is optional and not having one is the same as allowing all bots full access to your web site.

    The only use for a robots.txt file is to restrict access to directories and that restriction can be on a spider by spider basis and different directories for each spider.

    AdSense will need full access in order to server content sensitive ads.

  4. #4
    ABW Ambassador buy_online's Avatar
    Join Date
    January 18th, 2005
    Location
    Richmond, VA
    Posts
    3,234
    What SJohnson said above. I'll go one step further, and say that you really should have one anyway, even though you are allowing a spider/bot to crawl your entire site. You will see error messages in your logs, as they are looking for that file when they hit your front door.

    These folks can be very helpful.

    Fred

    You might just be a Redneck if - Birds are attracted to your beard...

  5. #5
    Full Member ellen-s4y's Avatar
    Join Date
    January 18th, 2005
    Posts
    489
    I use robots.txt to disallow bots from certain directorys, cgi-bin, etc.

    I use htaccess to ban bad bots that don't bother to look at robots.txt.

  6. #6
    ABW Ambassador Andy's Avatar
    Join Date
    January 18th, 2005
    Posts
    4,178
    ellen-s4y has it right. Use robots.txt to let the good bots know where to not go, and use .htaccess to ban bad bots, as they likely won't pay attention to your robots.txt file in the first place!

    Andy

    _______________
    <font color="red">Call the Exterminators! We've Got PARASITES!</font>

  7. #7
    Full Member
    Join Date
    January 18th, 2005
    Posts
    202
    "use .htaccess to ban bad bots"

    Provided you're on some flavor of Unix hosting.

    Windows does not have a .htaccess and banning is not as easy.

  8. #8
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR> "If you have a robots.txt file, you'll need to remove it or add the following two lines to your robots.txt to allow our content bot to crawl your site"

    User-agent: Mediapartners-Google*
    Disallow: <HR></BLOCKQUOTE>
    Not sure I even understand that. That robots.txt directive is telling the spider to do exactly what it was going to do in any case, so why would you need it ?

    Search Engine Positioning - 1 Design 4 Life

  9. #9
    2005 Linkshare Golden Link Award Winner  ecomcity's Avatar
    Join Date
    January 18th, 2005
    Location
    St Clair Shores MI.
    Posts
    17,328
    Here's my Robots.txt file ...feel free to copy it

    User-agent:*
    User-agent: Mediapartners-Google*
    Disallow:
    Disallow:/stats/
    Disallow:/_private/
    Disallow:/_borders/
    Disallow:/_fpclass/
    Disallow:/_overlay/
    Disallow:/_themes/
    Disallow:/_vti_bin/
    Disallow:/_vti_cnf/
    Disallow:/_vti_log/
    Disallow:/_vti_pvt/
    Disallow:/_vti_txt/
    Disallow:/images/
    Disallow:/club/
    User-agent: TurnitinBot
    Disallow: /
    User-agent: scooter
    Disallow: /

    User-agent: grub-client
    Disallow: /

    User-agent: grub
    Disallow: /

    User-agent: looksmart
    Disallow: /

    User-agent: WebZip
    Disallow: /

    User-agent: larbin
    Disallow: /

    User-agent: b2w/0.1
    Disallow: /

    User-agent: psbot
    Disallow: /

    User-agent: Python-urllib
    Disallow: /


    User-agent: URL_Spider_Pro
    Disallow: /

    User-agent: CherryPicker
    Disallow: /

    User-agent: EmailCollector
    Disallow: /

    User-agent: EmailSiphon
    Disallow: /

    User-agent: WebBandit
    Disallow: /

    User-agent: EmailWolf
    Disallow: /

    User-agent: ExtractorPro
    Disallow: /

    User-agent: CopyRightCheck
    Disallow: /

    User-agent: Crescent
    Disallow: /

    User-agent: SiteSnagger
    Disallow: /

    User-agent: ProWebWalker
    Disallow: /

    User-agent: CheeseBot
    Disallow: /

    User-agent: LNSpiderguy
    Disallow: /

    User-agent: ia_archiver
    Disallow: /

    User-agent: ia_archiver/1.6
    Disallow: /

    User-agent: Alexibot
    Disallow: /

    User-agent: Teleport
    Disallow: /

    User-agent: TeleportPro
    Disallow: /

    User-agent: MIIxpc
    Disallow: /

    User-agent: Telesoft
    Disallow: /

    User-agent: Website Quester
    Disallow: /

    User-agent: moget/2.1
    Disallow: /

    User-agent: WebZip/4.0
    Disallow: /

    User-agent: WebStripper
    Disallow: /

    User-agent: WebSauger
    Disallow: /

    User-agent: WebCopier
    Disallow: /

    User-agent: NetAnts
    Disallow: /

    User-agent: Mister PiX
    Disallow: /

    User-agent: WebAuto
    Disallow: /

    User-agent: TheNomad
    Disallow: /

    User-agent: WWW-Collector-E
    Disallow: /

    User-agent: RMA
    Disallow: /

    User-agent: libWeb/clsHTTP
    Disallow: /

    User-agent: asterias
    Disallow: /

    User-agent: httplib
    Disallow: /

    User-agent: turingos
    Disallow: /

    User-agent: spanner
    Disallow: /

    User-agent: InfoNaviRobot
    Disallow: /

    User-agent: Harvest/1.5
    Disallow: /

    User-agent: Bullseye/1.0
    Disallow: /

    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow: /

    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /

    User-agent: CherryPickerSE/1.0
    Disallow: /

    User-agent: CherryPickerElite/1.0
    Disallow: /

    User-agent: WebBandit/3.50
    Disallow: /

    User-agent: NICErsPRO
    Disallow: /

    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /

    User-agent: DittoSpyder
    Disallow: /

    User-agent: Foobot
    Disallow: /

    User-agent: SpankBot
    Disallow: /

    User-agent: BotALot
    Disallow: /

    User-agent: lwp-trivial/1.34
    Disallow: /

    User-agent: lwp-trivial
    Disallow: /

    User-agent: BunnySlippers
    Disallow: /

    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /

    User-agent: URLy Warning
    Disallow: /

    User-agent: Wget/1.6
    Disallow: /

    User-agent: Wget/1.5.3
    Disallow: /

    User-agent: Wget
    Disallow: /

    User-agent: LinkWalker
    Disallow: /

    User-agent: cosmos
    Disallow: /

    User-agent: moget
    Disallow: /

    User-agent: hloader
    Disallow: /

    User-agent: humanlinks
    Disallow: /

    User-agent: LinkextractorPro
    Disallow: /

    User-agent: Offline Explorer
    Disallow: /

    User-agent: Mata Hari
    Disallow: /

    User-agent: LexiBot
    Disallow: /

    User-agent: Web Image Collector
    Disallow: /

    User-agent: The Intraformant
    Disallow: /

    User-agent: True_Robot/1.0
    Disallow: /

    User-agent: True_Robot
    Disallow: /

    User-agent: BlowFish/1.0
    Disallow: /

    User-agent: JennyBot
    Disallow: /

    User-agent: MIIxpc/4.2
    Disallow: /

    User-agent: BuiltBotTough
    Disallow: /

    User-agent: ProPowerBot/2.14
    Disallow: /

    User-agent: BackDoorBot/1.0
    Disallow: /

    User-agent: toCrawl/UrlDispatcher
    Disallow: /

    User-agent: WebEnhancer
    Disallow: /

    User-agent: suzuran
    Disallow: /

    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /

    User-agent: VCI
    Disallow: /

    User-agent: Szukacz/1.4
    Disallow: /

    User-agent: QueryN Metasearch
    Disallow: /

    User-agent: Openfind data gathere
    Disallow: /

    User-agent: Openfind
    Disallow: /

    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /

    User-agent: Xenu's
    Disallow: /

    User-agent: Zeus
    Disallow: /

    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /

    User-agent: RepoMonkey
    Disallow: /

    User-agent: Microsoft URL Control
    Disallow: /

    User-agent: Openbot
    Disallow: /

    User-agent: URL Control
    Disallow: /

    User-agent: Zeus Link Scout
    Disallow: /

    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /

    User-agent: Webster Pro
    Disallow: /

    User-agent: EroCrawler
    Disallow: /

    User-agent: LinkScan/8.1a Unix
    Disallow: /

    User-agent: Keyword Density/0.9
    Disallow: /

    User-agent: Kenjin Spider
    Disallow: /

    User-agent: Iron33/1.0.2
    Disallow: /

    User-agent: Bookmark search tool
    Disallow: /

    User-agent: GetRight/4.2
    Disallow: /

    User-agent: FairAd Client
    Disallow: /

    User-agent: Gaisbot
    Disallow: /

    User-agent: Aqua_Products
    Disallow: /

    User-agent: Radiation Retriever 1.1
    Disallow: /

    User-agent: WebmasterWorld Extractor
    Disallow: /

    User-agent: Flaming AttackBot
    Disallow: /

    User-agent: Oracle Ultra Search
    Disallow: /

    User-agent: PerMan
    Disallow: /

    User-agent: searchpreview
    Disallow: /

    Mike & Charlie ...

    If they won't adopt and feed a bird ..flip them one! BBQ some Gator and remember to flush WhenU..

  10. #10
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    United Kingdom
    Posts
    1,797
    That's a pretty comprehensive robots.txt - why have you disallowed AltaVista, though ?

    Search Engine Positioning - 1 Design 4 Life

  11. #11
    2005 Linkshare Golden Link Award Winner  ecomcity's Avatar
    Join Date
    January 18th, 2005
    Location
    St Clair Shores MI.
    Posts
    17,328
    Which one is altavista?

    Mike & Charlie ...

    If they won't adopt and feed a bird ..flip them one! BBQ some Gator and remember to flush WhenU..

  12. #12
    Full Member
    Join Date
    January 18th, 2005
    Posts
    362
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by EcomCity.com:
    Which one is altavista?

    Mike & Charlie ...

    If they won't adopt and feed a bird ..flip them one! BBQ some Gator and remember to flush WhenU..<HR></BLOCKQUOTE>

    Scooter is Alta Vista

  13. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Robots Txt Plugin, Do You Use One?
    By Trust in forum Blogging, Mobile and Social Media
    Replies: 6
    Last Post: August 31st, 2010, 06:46 PM
  2. Robots.txt for WordPress?
    By Uncle Rico in forum Blogging, Mobile and Social Media
    Replies: 6
    Last Post: October 30th, 2009, 08:43 AM
  3. Restricted by robots.txt without robots.txt?
    By mayfly in forum Search Engine Optimization
    Replies: 10
    Last Post: August 26th, 2009, 05:13 PM
  4. Robots.txt
    By Rhia7 in forum Midnight Cafe'
    Replies: 0
    Last Post: April 18th, 2009, 12:34 AM
  5. robots txt
    By reflections in forum Programming / Datafeeds / Tools
    Replies: 5
    Last Post: December 26th, 2002, 06:22 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •