Results 1 to 12 of 12
  1. #1
    ABW Ambassador AddHandler's Avatar
    Join Date
    January 19th, 2005
    Posts
    1,270
    How To Block Robots...?? Help..
    Hello,
    I have a folder I do not want indexed in the SE's.. I added the DENY to my ROBOTS.txt file for the folder the pages are in...

    BUT - that stupid GOOGLEBOT goes in and indexes them ANYWAY..


    This is what I have in my ROBOTS.txt file...
    ________________________________________
    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /images/
    Disallow: /myfolder/
    ________________________________________

    Is that not correct??? Or should I make a DENY for each SPECIFIC ROBOT..??

    I hate it when they don't follow the rules. Now I have all these pages indexed in Google.. that shouldn't be...

    HELP!

  2. #2
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    I have done it like this SuZe (below) without the end slash but there are many that just ignore the robots.txt

    I think the only way to stop them is to put an .htaccess file in the folders you want to keep them out of and list all their IP's in this file.

    This is the main reason I wanted to use php to keep updating the .htaccess file.
    User-agent: *
    Disallow: /cgi-bin
    Disallow: /data
    Disallow: /images

    PS. I have noticed on at least one of my sites Google has listed every cloaked link I have despite it suposedly being banned via my robots.txt
    One day parasites and their ilk will be made illegal, I bet a few Lawyers will be pissed off when the day comes.
    Mr. Spitzer is fetching it nearer

    YouTrek

  3. #3
    ABW Ambassador AddHandler's Avatar
    Join Date
    January 19th, 2005
    Posts
    1,270
    Yep that is what is happening to me..

    My cloaked link pages are showing up in the SE's...

    EVEN though there is NO DATA on those pages at all..
    They will probably never be seen by anyone but me searching for my DOMAIN name specifically but... I still don't like those redirect pages being listed in the SE's.. it makes ME LOOK BAD... even though I have done everything I can to prevent it.. these ROBOTS don't follow rules!

    I guess I could add more to the cloaked pages..
    like a meta saying NO FOLLOW - NO INDEX..
    Do you think that would work...??

    ALL I have on those pages now is a PHP redirect...
    It's fast and can't be seen.. ACCEPT by the dirty none rule following ROBOTS..



    I THOUGHT THE GOOGLE BOT WAS ONE OF THE ROBOTS THAT ACTUALLY PLAYED BY THE RULES OF ROBOTS.TXT... I guess they figure they DO NOT HAVE TO FOLLOW ANYONES RULES....

    ONE MORE REASON TO ABSOLUTELY HATE GOOGLE....
    GO AWAY GOOGLE... seeing as you cannot play FAIR..

  4. #4
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    Texas, USA
    Posts
    579
    Quote Originally Posted by SuZe

    I guess I could add more to the cloaked pages..
    like a meta saying NO FOLLOW - NO INDEX..
    Do you think that would work...??



    I THOUGHT THE GOOGLE BOT WAS ONE OF THE ROBOTS THAT ACTUALLY PLAYED BY THE RULES OF ROBOTS.TXT...


    Hi, SuZe.

    I put robot meta tags on each page of my web site. The only Bot trouble I have had is some wild weasel Bot from China which I locked out of my site by entering it's IP# in my Apache web server IP Deny Manager.

    Some robots don't play by the rules. Some are not who they say they are.

    The below code between the head tags on every page of my web site works for my purposes. Please feel free to modify it for your purposes.

    <META NAME="robots" CONTENT="index, follow">
    <META NAME="robots" CONTENT="noarchive">
    <META NAME="GoogleBot" CONTENT="NoArchive">

    Best,

    RadarCat, Webmaster
    http://www.os2warplinks.com

  5. #5
    ABW Ambassador AddHandler's Avatar
    Join Date
    January 19th, 2005
    Posts
    1,270
    RadarCat
    Thanks - but my cloaked pages don't have HEAD tags..
    it's just a simple PHP redirect...

    I guess I could ADD head tags in order to add the META's..
    That would work... I was just trying to do it all from one file instead of every single page...

    It does work so I guess that's what I will do from now on..
    THANKS AGAIN...!


  6. #6
    Full Member Tech Evangelist's Avatar
    Join Date
    March 16th, 2005
    Location
    Mesa, AZ
    Posts
    374
    There is a new attribute that can be added to the links to a page to prevent spiders from following. To use it, add the rel="nofollow" attribute to your hyperlinks. Google. MSN and Yahoo all officially recognize this attribute. That still does not necessarily guarantee that they will not follow it.

  7. #7
    Roll Tide mobilebadboy's Avatar
    Join Date
    January 18th, 2005
    Location
    Mobile, Alabama
    Posts
    1,220
    It's also good to remember that Google doesn't grab the robots file every time it comes back. At least it didn't use to. So even if you made changes to it there's nothing saying Google has read it since you made them.

    As soon as I make any robots changes I go to Google's remove form and have them removed (almost instantly). As long as the disallow exists it will remove them from the index.

    Shawn Kerr (.com) | Disney World | SEC Football

  8. #8
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    I know this thread is old and everyone in here (except me) knows what to do now, but I just tried to make a sitemap for a new site and it indexed all my redirects. I'm still learning php and MySQL, but until I get there I'm using teeny redirect url files and would like for the bots not to go there. They are blocked via robots.txt and I have put .htaccess in each folder but something is not right. help?

  9. #9
    http and a telephoto
    Join Date
    January 18th, 2005
    Location
    NYC
    Posts
    17,708
    SuZe?
    Deborah Carney
    TeamLoxly.com BookGoodies.com ABCsPlus.com

  10. #10
    Full Member Tech Evangelist's Avatar
    Join Date
    March 16th, 2005
    Location
    Mesa, AZ
    Posts
    374
    2busy

    The robots.txt file is a long term solution. Although Google claims to read the file regularly, I have seen diectories that were blocked in the robots.txt still show up in both Google and Yahoo for six months or more.

    Adding the following meta tags works best for individual pages. This covers just about everything with legitimate spiders.

    <meta name="robots" content="noindex,follow">
    <meta name="robots" content="noarchive">

    Adding the rel="nofollow" attribute to each hyperlink tends to work best for individual links. Google, Yahoo and MSN each recognize this attribute.

    If you are sending users through a script such as click.php, you should also block that in the robots.txt.

    example:

    Disallow: /click.php
    There's good, fast and cheap. Pick any two.
    [url=http://www.topranksolutions.com]Phoenix SEO[/url] :: [url=http://www.tech-evangelist.com/category/affiliate-marketing/]Affiliate Marketing Tutorials[/url]

  11. #11
    ABW Ambassador 2busy's Avatar
    Join Date
    January 17th, 2005
    Location
    Tropical Mountaintop
    Posts
    5,636
    Thank you Tech Evangelist,
    It is a new site so nothing is indexed yet, I was in the process of submitting it to google when the sitemap generator made me look for an answer before completing the process. This is what I have now for robots.txt:
    User-agent: *
    Disallow: /cgi-bin
    Disallow: /A Link Folder
    Disallow: /Another Link Folder
    Disallow: /Another Link Folder
    Disallow: /Another Link Folder

    where "Another Link Folder" is one of the folders where my little .php redirects are. I put a little .htaccess in each folder also with: IndexIgnore *.php . There is nothing else in these folders except the little redirects and the .htaccess file. I'm sure this will all be easier once I get a good handle on the MySQL end of it but for now I just want it to run smoothly with being indexed.

  12. #12
    ABW Veteran Mr. Sal's Avatar
    Join Date
    January 18th, 2005
    Posts
    6,795
    If you are sending users through a script such as click.php, you should also block that in the robots.txt.

    example:

    Disallow: /click.php
    I don't know but........

    Last month I did that, then on the G webmaster tools, they show up as errors, (even tho they say that I may see an error because the links might be blocked) so I remove it from the robots.txt and now they don't show as error links on GWT's.

  13. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Restricted by robots.txt without robots.txt?
    By mayfly in forum Search Engine Optimization
    Replies: 10
    Last Post: August 26th, 2009, 05:13 PM
  2. Do you use a robots.txt?
    By Mr. Sal in forum Voting Booth
    Replies: 11
    Last Post: November 12th, 2003, 07:29 PM
  3. Some insight about robots please...
    By new yorker in forum Search Engine Optimization
    Replies: 4
    Last Post: August 26th, 2003, 05:28 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •