Results 1 to 13 of 13
  1. #1
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    UK
    Posts
    539
    I have a baby name database, which goes on forever, and wget has taken 2000 pages today so far, and has cost me about .50gb, i need help in stopping it, what do i put in my site? does it go in the mete tags?

    Please help

  2. #2
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    UK
    Posts
    539
    Ok, i have since used a robots.txt file with User-agent: Wget/1.6
    Disallow:
    in the disallow bit and others!!

    I found the ip address of the person which is 24.84.39.65
    I have emailed the abuse address that i found while looking for the owner.

    Is there anything else i can do? as i assume they want to nick my 20,000 baby names, or am i just stupid not to have a no robots thing in the first place?

  3. #3
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    UK
    Posts
    539
    ok, its now up to 40,000 pages, please help

  4. #4
    pph Expert! Gordon's Avatar
    Join Date
    January 18th, 2005
    Location
    Edmonton Canada
    Posts
    5,781
    I'm sorry Tamalyn I wish I could help you. Just hang in here I feel sure someone will pop in and answer your question shortly.

    Take care
    YouTrek.com

  5. #5
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    UK
    Posts
    539
    Ok, just to let anyone know, if they have this problem, i have blocked the ip address with htaccess, and also used this php script
    http://www.hotscripts.com/Detailed/11062.html

    and also added a robots.txt using User-agent: Wget/1.6
    Disallow:

    so, I have no idea what stopped the sucker, but I think I am safe for a while!!???

  6. #6
    Animal Lover
    Join Date
    January 18th, 2005
    Location
    oz
    Posts
    1,210
    Glad you managed to stop it - I would've loved to help but I really didn't have the expertise to help!

    Oscar

  7. #7
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    UK
    Posts
    539
    thanks, i didn't either, I think I have learn't a lot tonight!!! i hate to pay for bandwidth, especially when its stolen.

    I will find the site that uses my baby names, and they will be very sorry!!!

  8. #8
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    ÄúsTrálíĺ
    Posts
    1,372
    Robots.txt won't be helpful usually in these type of situations because the only bots that obey robots.txt are the ones that have been programmed to obey them.

    .htaccess, or using a script (if you can't use .htaccess) to ban via .htaccess would be a better method. I'm sure there are other ways too.

  9. #9
    Animal Lover
    Join Date
    January 18th, 2005
    Location
    oz
    Posts
    1,210
    Yo Pete,

    a fellow Ozzie...greetings from Sydney...

    Oscar

  10. #10
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,663
    Global.asa is an option if you're using asp pages.

    Wayne

  11. #11
    ABW Ambassador Nova's Avatar
    Join Date
    January 18th, 2005
    Location
    home
    Posts
    2,395
    Hi Tamalyn,

    I am not sure if this will help you find this person.

    Try to type in http:// then the ip# and then .com

    Not sure if I'm right but I think you can find out who is that person if you do this.

    Just my two cents.

    Glad that you block this leach!

    ------------------------------
    What does the COC stand for? Crooks Overwriting Commissions.
    Don't worry! Tracking is infected!
    ------------------------------
    Love Life to the fullest. we only get ONE chance! :-) !

  12. #12
    Full Member
    Join Date
    January 18th, 2005
    Posts
    480
    If your host allows .htaccess files, then this is a good. I use it on all my sites. It is designed to block a lot of the down-loaders, etc. It also blocks a few robots I do not like. For example, it will block all bots that start with "CJ"

    USE AT YOUR OWN RISK. I KNOW NOTHING, DO NOT CLAIM TO KNOW ANYTHING AND AM NOT RESPONSIBLE FOR ANYTHING.

    ===== COPY BELOW THIS LINE BUT NOT THIS LINE=====


    Options ExecCGI FollowSymLinks Includes
    AddHandler server-parsed .html
    AddType application/x-httpd-cgi .cgi

    RewriteEngine on

    RewriteBase /
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^CJ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NPBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Siphon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus
    RewriteCond %{HTTP_USER_AGENT} ^BaiDuSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
    RewriteRule ^.* - [F,L]

    ===== COPY ABOVE THIS LINE BUT NOT THIS LINE=====


    Make a file in your web site's web directory called ".htaccess" . Copy and paste the above.
    It may, or may not work.

  13. #13
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,403
    Mrs Happypoon,
    yes it works

  14. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Need wget alternative
    By mobilebadboy in forum Programming / Datafeeds / Tools
    Replies: 2
    Last Post: February 20th, 2007, 11:11 PM
  2. Using Wget in PHP
    By bsnrjones in forum Programming / Datafeeds / Tools
    Replies: 4
    Last Post: March 2nd, 2004, 04:10 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •