Results 1 to 8 of 8
  1. #1
    ABW Ambassador Doug247's Avatar
    Join Date
    January 18th, 2005
    Location
    DE USA
    Posts
    931
    Wink Robots.txt questions
    what do I need to have in my robots.txt file to only allow google, msn, and yahoo?


    Also is there a way to ban scraper sites or other crappy sites from linking to your site?

    TIA

    Hope all is well with everyone,
    Doug

  2. #2
    ABW Ambassador Doug247's Avatar
    Join Date
    January 18th, 2005
    Location
    DE USA
    Posts
    931
    Nice link thanks.

    Question though If I disallow all with the specific bots still crawl?

  3. #3
    The Great Egress NewcastleB's Avatar
    Join Date
    January 19th, 2005
    Posts
    65
    Robots.txt won't do anything to stop the scraper bots. Most don't even bother reading it, others read and ignore. I'm testing out a spider trap right now. I've caught a few, but I still see a lot of suspicious activity.
    But are you still master of your domain?

  4. #4
    All Around Web Guy Cursal's Avatar
    Join Date
    January 18th, 2005
    Posts
    829
    Have to agree with NewCastleB.

    I have added all kinds of edits to my robots.txt file to keep out the crap, with little affect.
    I love the log spam I get...lol

  5. #5
    Full Member Zdig's Avatar
    Join Date
    February 26th, 2005
    Posts
    274
    criminals aren't going to listen to your robots.txt requests

  6. #6
    Prince of Content Vinny O'Hare's Avatar
    Join Date
    January 18th, 2005
    Posts
    3,126
    I just seen my css code indexed after adding a robots.txt example
    http://www.example.cpm/abc.css how would I go about blocking that from getting indexed since it is in my main root folder and not in a seperate /folder/ or should I not even care.
    Vinny O'Hare - OPM - Contact Info email: vinny at teamloxly.com ~ 702-582-6742 Twitter

  7. #7
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Los Angeles
    Posts
    4,053
    Doesn't make much difference, though Yahoo's been fetching css files.

    User-Agent: *
    Disallow: /stylesheetname.css

    http://www.robotstxt.org/wc/faq.html

  8. #8
    Prince of Content Vinny O'Hare's Avatar
    Join Date
    January 18th, 2005
    Posts
    3,126
    Thanks - Isnt it funny that you cant get a page listed in yahoo that you want, but you can get your css file into the search engine without even trying.

    I went to yahoo and put one of my urls in and the first thing that popped up was the css file.
    Vinny O'Hare - OPM - Contact Info email: vinny at teamloxly.com ~ 702-582-6742 Twitter

  9. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Restricted by robots.txt without robots.txt?
    By mayfly in forum Search Engine Optimization
    Replies: 10
    Last Post: August 26th, 2009, 05:13 PM
  2. Robots.txt
    By Rhia7 in forum Midnight Cafe'
    Replies: 0
    Last Post: April 18th, 2009, 12:34 AM
  3. Robots.txt question.
    By Hardaka in forum Newbie Affiliate FAQs & Helpful Articles
    Replies: 11
    Last Post: December 20th, 2007, 08:49 PM
  4. Do you use a robots.txt?
    By Mr. Sal in forum Voting Booth
    Replies: 11
    Last Post: November 12th, 2003, 07:29 PM
  5. robots txt
    By reflections in forum Programming / Datafeeds / Tools
    Replies: 5
    Last Post: December 26th, 2002, 06:22 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •