Results 1 to 12 of 12
  1. #1
    Animal Lover
    Join Date
    January 18th, 2005
    Location
    oz
    Posts
    1,210
    Is there anyway you can check from your stats if someone is copying your site? I've noticed that there are a few hits who've stayed on the site for more than an hour - is that an indication?

    Oscar

  2. #2
    2005 Linkshare Golden Link Award Winner  ecomcity's Avatar
    Join Date
    January 18th, 2005
    Location
    St Clair Shores MI.
    Posts
    17,328
    This robots.txt file takes care of most of the autmated site copiers.

    User-agent:*
    User-agent: Mediapartners-Google*
    Disallow:
    Disallow:/stats/
    Disallow:/_private/
    Disallow:/_borders/
    Disallow:/_fpclass/
    Disallow:/_overlay/
    Disallow:/_themes/
    Disallow:/_vti_bin/
    Disallow:/_vti_cnf/
    Disallow:/_vti_log/
    Disallow:/_vti_pvt/
    Disallow:/_vti_txt/
    Disallow:/images/
    Disallow:/club/
    User-agent: TurnitinBot
    Disallow: /
    User-agent: grub-client
    Disallow: /

    User-agent: grub
    Disallow: /

    User-agent: looksmart
    Disallow: /

    User-agent: WebZip
    Disallow: /

    User-agent: larbin
    Disallow: /

    User-agent: b2w/0.1
    Disallow: /

    User-agent: psbot
    Disallow: /

    User-agent: Python-urllib
    Disallow: /


    User-agent: URL_Spider_Pro
    Disallow: /

    User-agent: CherryPicker
    Disallow: /

    User-agent: EmailCollector
    Disallow: /

    User-agent: EmailSiphon
    Disallow: /

    User-agent: WebBandit
    Disallow: /

    User-agent: EmailWolf
    Disallow: /

    User-agent: ExtractorPro
    Disallow: /

    User-agent: CopyRightCheck
    Disallow: /

    User-agent: Crescent
    Disallow: /

    User-agent: SiteSnagger
    Disallow: /

    User-agent: ProWebWalker
    Disallow: /

    User-agent: CheeseBot
    Disallow: /

    User-agent: LNSpiderguy
    Disallow: /

    User-agent: ia_archiver
    Disallow: /

    User-agent: ia_archiver/1.6
    Disallow: /

    User-agent: Alexibot
    Disallow: /

    User-agent: Teleport
    Disallow: /

    User-agent: TeleportPro
    Disallow: /

    User-agent: MIIxpc
    Disallow: /

    User-agent: Telesoft
    Disallow: /

    User-agent: Website Quester
    Disallow: /

    User-agent: moget/2.1
    Disallow: /

    User-agent: WebZip/4.0
    Disallow: /

    User-agent: WebStripper
    Disallow: /

    User-agent: WebSauger
    Disallow: /

    User-agent: WebCopier
    Disallow: /

    User-agent: NetAnts
    Disallow: /

    User-agent: Mister PiX
    Disallow: /

    User-agent: WebAuto
    Disallow: /

    User-agent: TheNomad
    Disallow: /

    User-agent: WWW-Collector-E
    Disallow: /

    User-agent: RMA
    Disallow: /

    User-agent: libWeb/clsHTTP
    Disallow: /

    User-agent: asterias
    Disallow: /

    User-agent: httplib
    Disallow: /

    User-agent: turingos
    Disallow: /

    User-agent: spanner
    Disallow: /

    User-agent: InfoNaviRobot
    Disallow: /

    User-agent: Harvest/1.5
    Disallow: /

    User-agent: Bullseye/1.0
    Disallow: /

    User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
    Disallow: /

    User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
    Disallow: /

    User-agent: CherryPickerSE/1.0
    Disallow: /

    User-agent: CherryPickerElite/1.0
    Disallow: /

    User-agent: WebBandit/3.50
    Disallow: /

    User-agent: NICErsPRO
    Disallow: /

    User-agent: Microsoft URL Control - 5.01.4511
    Disallow: /

    User-agent: DittoSpyder
    Disallow: /

    User-agent: Foobot
    Disallow: /

    User-agent: SpankBot
    Disallow: /

    User-agent: BotALot
    Disallow: /

    User-agent: lwp-trivial/1.34
    Disallow: /

    User-agent: lwp-trivial
    Disallow: /

    User-agent: BunnySlippers
    Disallow: /

    User-agent: Microsoft URL Control - 6.00.8169
    Disallow: /

    User-agent: URLy Warning
    Disallow: /

    User-agent: Wget/1.6
    Disallow: /

    User-agent: Wget/1.5.3
    Disallow: /

    User-agent: Wget
    Disallow: /

    User-agent: LinkWalker
    Disallow: /

    User-agent: cosmos
    Disallow: /

    User-agent: moget
    Disallow: /

    User-agent: hloader
    Disallow: /

    User-agent: humanlinks
    Disallow: /

    User-agent: LinkextractorPro
    Disallow: /

    User-agent: Offline Explorer
    Disallow: /

    User-agent: Mata Hari
    Disallow: /

    User-agent: LexiBot
    Disallow: /

    User-agent: Web Image Collector
    Disallow: /

    User-agent: The Intraformant
    Disallow: /

    User-agent: True_Robot/1.0
    Disallow: /

    User-agent: True_Robot
    Disallow: /

    User-agent: BlowFish/1.0
    Disallow: /

    User-agent: JennyBot
    Disallow: /

    User-agent: MIIxpc/4.2
    Disallow: /

    User-agent: BuiltBotTough
    Disallow: /

    User-agent: ProPowerBot/2.14
    Disallow: /

    User-agent: BackDoorBot/1.0
    Disallow: /

    User-agent: toCrawl/UrlDispatcher
    Disallow: /

    User-agent: WebEnhancer
    Disallow: /

    User-agent: suzuran
    Disallow: /

    User-agent: VCI WebViewer VCI WebViewer Win32
    Disallow: /

    User-agent: VCI
    Disallow: /

    User-agent: Szukacz/1.4
    Disallow: /

    User-agent: QueryN Metasearch
    Disallow: /

    User-agent: Openfind data gathere
    Disallow: /

    User-agent: Openfind
    Disallow: /

    User-agent: Xenu's Link Sleuth 1.1c
    Disallow: /

    User-agent: Xenu's
    Disallow: /

    User-agent: Zeus
    Disallow: /

    User-agent: RepoMonkey Bait & Tackle/v1.01
    Disallow: /

    User-agent: RepoMonkey
    Disallow: /

    User-agent: Microsoft URL Control
    Disallow: /

    User-agent: Openbot
    Disallow: /

    User-agent: URL Control
    Disallow: /

    User-agent: Zeus Link Scout
    Disallow: /

    User-agent: Zeus 32297 Webster Pro V2.9 Win32
    Disallow: /

    User-agent: Webster Pro
    Disallow: /

    User-agent: EroCrawler
    Disallow: /

    User-agent: LinkScan/8.1a Unix
    Disallow: /

    User-agent: Keyword Density/0.9
    Disallow: /

    User-agent: Kenjin Spider
    Disallow: /

    User-agent: Iron33/1.0.2
    Disallow: /

    User-agent: Bookmark search tool
    Disallow: /

    User-agent: GetRight/4.2
    Disallow: /

    User-agent: FairAd Client
    Disallow: /

    User-agent: Gaisbot
    Disallow: /

    User-agent: Aqua_Products
    Disallow: /

    User-agent: Radiation Retriever 1.1
    Disallow: /

    User-agent: WebmasterWorld Extractor
    Disallow: /

    User-agent: Flaming AttackBot
    Disallow: /

    User-agent: Oracle Ultra Search
    Disallow: /

    User-agent: PerMan
    Disallow: /

    User-agent: searchpreview
    Disallow: /

    Mike & Charlie ...

    If they won't adopt and feed a bird ..flip them one! BBQ some Gator and remember to flush WhenU..

  3. #3
    Animal Lover
    Join Date
    January 18th, 2005
    Location
    oz
    Posts
    1,210
    Wow, that's quite a list Charlie...

    thanks for that!

    Oscar

  4. #4
    Newbie
    Join Date
    January 18th, 2005
    Posts
    49
    So, if I understand, forgive a newbie, this file, named robots.txt, in your root folder will disallow the info. from site copiers? Am REALLY interested on this point as I am starting a generic type shopping portal. Thanks for any advice, I promise Charlie a nice meal for a nice response!

    -gray day
    "Be really whole and all things will come to you"

    -Lao Tsu

  5. #5
    2005 Linkshare Golden Link Award Winner  ecomcity's Avatar
    Join Date
    January 18th, 2005
    Location
    St Clair Shores MI.
    Posts
    17,328
    Works like a charm of the automated site copying tools and even the offline reading tools thta cache sites locally. The Frontpage exclusions make them have to know the FP username/password. Then it is not worth much on my site as the general concencis is it ain't worth copying any section. I bloat on purpose the keep out the cheapscake dialup shoppers as I sell no junk for the merchants. I also embed some stuff on the images so I can easilty find the copiers if they snatch pages.

    Mike & Charlie ...

    If they won't adopt and feed a bird ..flip them one! BBQ some Gator and remember to flush WhenU..

  6. #6
    Newbie
    Join Date
    January 18th, 2005
    Posts
    49
    But does it affect how bots read your site? Will googlebot get confused when trying to read it, and move on?

    -gray day
    "Be really whole and all things will come to you"

    -Lao Tsu

  7. #7
    Newbie
    Join Date
    January 18th, 2005
    Posts
    3,219
    I stopped the disallows instead obtained copyright. Then place a code word on every page. Normally it looks like a simple typo.

    Example and the dog walked by the sunflower, a car hit him.

    “Sunflower” is the key word or the key phrase is “sunflower, a car hit him”. Most theft is a simply copy completely and have no idea on the keywords/phrases.

    Look for your keyword/phrase on the engines and send the results to the lawyer you retained. Whala a site making maybe $500 a month can now be a cash cow for suits!

    So if anyone has thought about stealing my stuff... Guess I will read your crying post later! Oh wait I demand prosecution in the most blatant stealing and a few people cannot even look at a computer for a few years!

    My way does work, even if it takes a bit to ensure they are prosecuted and sued civilly.

    SandraR<FONT face=Arial size=2>

    </FONT>

  8. #8
    Newbie
    Join Date
    January 18th, 2005
    Posts
    49
    As well they should be. Go get ' em.

    -gray day
    "Be really whole and all things will come to you"

    -Lao Tsu

  9. #9
    Just Lurking
    Join Date
    January 18th, 2005
    Posts
    1,263
    I just wait for them to show up here and start griping about not making any sales. THAT'S RIGHT I SUSPECT ALL OF YOU HAVE COPIED MY SITE!

    Mike's robots.txt will help discourge the simple site copier but if the scum knows how to use the tool he'll come in looking just like any other browser.

    Nice robots.txt file Mike.

    ------------------------------
    "Everybody gets so much information all day long that they lose their common sense." - Gertrude Stein, American author (1874-1946).

  10. #10
    Pit Boss redsand's Avatar
    Join Date
    January 18th, 2005
    Location
    Oak Grove
    Posts
    642
    just moved to a new host. and began to notice the error file requesting robot.txt. what's that supposed to mean ? do i need to create robot.txt file myself as not all web-hosting company don't neccessarily provide one ?

  11. #11
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    1,663
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>I stopped the disallows instead obtained copyright.<HR></BLOCKQUOTE> Robots.txt does nothing to prevent a determined site copier. It really is a "Keep Out" sign on a room with the door wide open. Google and other bots with integrity will pass that room by and only go in rooms without the sign.

    Bad and malicious bots, like Cyvelliance, Amazon, and any site copier will ignore the sign, walk in and look around. Site copiers will help themselves to what they find.

    Use global.asa (on a windows server) or .htacces to totally block a bot (or anyone if you know the IP) from accessing your site. In all the cases though, you have to know who the sitecopier is or the name of their robot if they use one. I suspect this type of theft is done by manually clicking on "save as" since a robot will have no way of deciding if your site merits copying.

    I have a little used site with a spider trap. Bots sometimes nose around a blocked directory and find an ASP generated file that says "You should have read our robots.txt file." It then displays a new link to the same file but with an ever increasing number as a parameter. Had several bots turn up over the past few days falling into the trap. One grabbed this file 27565 times over two days before giving up.
    After feeding, they are then blocked by the global.asa file.

    Wayne

  12. #12
    ABW Adviser Panel Dynamoo's Avatar
    Join Date
    January 18th, 2005
    Location
    Opposite the Slough of Despond
    Posts
    5,465
    It's fairly easy to find a unique string in a web page to see if it's been copied. The key thing is to choose a few words, across a sentence break. This basically guarantees uniqueness, because although many sentences are made up of common sequences of words, the flow of ideas indicated by the different sentences is what actually makes the thing unique.

    For example, search for "sentence break. This basically" in Google and there are no matches for that series of four words anywhere.. but try it with "common sequences of words" and you get 20 matches.

    Even if someone has changed the text slightly, searching across the sentence break will be more reliable because it needs a smaller series of words for a match.

    ________
    "All your commission are belong to us." - Slimeware Corporation

  13. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Site Copying - CJ
    By midnightcoder in forum Commission Junction - CJ
    Replies: 10
    Last Post: October 7th, 2003, 04:25 PM
  2. Site copying -Cheater..... !
    By ceo in forum Midnight Cafe'
    Replies: 5
    Last Post: March 26th, 2003, 07:33 AM
  3. Site Copying AGAIN!
    By Haiko de Poel, Jr. in forum Midnight Cafe'
    Replies: 14
    Last Post: August 9th, 2002, 07:29 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •