Results 1 to 25 of 25
  1. #1
    Newbie
    Join Date
    January 18th, 2005
    Posts
    6
    Hello,

    I love Webmerge to create my pages, but I have a question on an unrelated topic. I work with an affiliate that does not have a data feed. I would like to "scrape" their pages so I can present the pages myself the way I would like to. Any tools or macro products out there that would be good for this use?

    Thanks.

  2. #2
    ABW Ambassador Mike O's Avatar
    Join Date
    January 18th, 2005
    Location
    Los Angeles area
    Posts
    843
    Hi, KS:

    Two questions:

    1) Something you said is confusing me (easily done, I'm afraid), so please clarify to help me understand. You say you work with an "affiliate" that does not have a datafeed. Do you mean "vendor"?

    2) There are significant issues with scraping a site's content. Here's a couple:

    A) Do you have clear, unequivocal approval from the vendor to copy or use anything you wish from his site?

    B) Does the vendor himself have the right to let you copy everything on his site? I'm thinking of Amazon, for example, where some of the book reviews they use cannot be copied and used by affiliates, while others can.

    Just trying to get things clear in my head about what you're trying to do.

    -- Mike

    "Men travel faster now,
    but I do not know if they go to better things."
    -- Willa Cather

  3. #3
    Newbie
    Join Date
    January 18th, 2005
    Posts
    6
    Hello,

    Sorry for the confusion, but yes, I do have permission to use the content. The vendor has provided a cobranded site that they power. Unfortunately, their site is very hard to navigate and use for my customers so I want to use their content and create a better front end.

    I do have their permission to do so, but they do not have a datafeed with the data in a easy to use format.

    Hope that helps.

  4. #4
    Super Sh!t Stirrer SSanf's Avatar
    Join Date
    January 18th, 2005
    Posts
    9,944
    I have a similar situation. Just had to get over 5,500 links and images by hand! The vendor knows and loves me. He has a new site and new products and wants me to do the same thing there. Need help.

    Mr.Merchant, if you do business in any way what-so-ever with parasites, your products will not be sold on my sites!!

    Farewell, CJ! I loved you when you were young and pure. I will try to remember you that way. Disclaimer: Comments are to be interpreted as opinion unless otherwise noted.

  5. #5
    2005 Linkshare Golden Link Award Winner  ecomcity's Avatar
    Join Date
    January 18th, 2005
    Location
    St Clair Shores MI.
    Posts
    17,328
    The solution ... www.webventuri.com and tell them Mike sent you. Even the simple one liner below will display product linkable data. Cur & Paste and try it on your site.

    <script language="JavaScript" src="http://www.webventuri.com/beta/prodview.js?vid=1455058"></script>

    Mike & Charlie ...

    If they won't adopt and feed a bird ..flip them one! BBQ some Gator and remember to flush WhenU..

    [This message was edited by EcomCity.com on October 15, 2003 at 06:32 AM.]

  6. #6
    Super Sh!t Stirrer SSanf's Avatar
    Join Date
    January 18th, 2005
    Posts
    9,944
    Interesting. But it would be my luck that they would contact the merchant, sell them the service and then all the affiliates would have the info which I need like I need to be shot.

    Is there some way I can do it myself?

    Mr.Merchant, if you do business in any way what-so-ever with parasites, your products will not be sold on my sites!!

    Farewell, CJ! I loved you when you were young and pure. I will try to remember you that way. Disclaimer: Comments are to be interpreted as opinion unless otherwise noted.

  7. #7
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    England
    Posts
    4,327
    If you do it, and it can be done successfully, surely the merchant would want other affiliates to also do it?
    If you give the merchant the idea of what you are doing, what would stop them from doing it so all of their affiliates would have access to a datafeed? So you do all the work and then find out that the merchant has also done it for themselves, and then given all of your competitors access to the datafeed.

    www.cjshoppingnetwork.com

  8. #8
    Newbie
    Join Date
    January 18th, 2005
    Posts
    6
    I did some research, and I think I've found one possible option. It's called Macro Scheduler and basically it's a macro program that you can program anyway you would like.

    With some playing around, I think I can have it go through each page one by one, download it, and then open it up and look for patterns in the file and extract the data based on this. It will take a bit of work to setup, but once it's done, it'll be efficient to do it for many different pages. If the site changes the layout, then a little tweaking and you'll be set.

  9. #9
    I like traffic lights
    Join Date
    January 18th, 2005
    Location
    Southern hemisphere - away from Fukushima
    Posts
    2,936
    PERL is your friend.

    ****************************
    Jimmy James Inc. fan club membership # 3312

    "But Jimmy had fancy plans, and pants to match"

  10. #10
    Affiliate Manager
    Join Date
    January 18th, 2005
    Location
    Los Angeles, California
    Posts
    1,913
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by K S:
    Hello,

    I love Webmerge to create my pages, but I have a question on an unrelated topic. I work with an affiliate that does not have a data feed. I would like to "scrape" their pages so I can present the pages myself the way I would like to. Any tools or macro products out there that would be good for this use?<HR></BLOCKQUOTE>

    I've written a scraper before -- not a lot of fun, since it means a lot of work for just one site that'll break if the layout changes.

    If this vendor's serious aout working with affiliates why not provide a feed? It seems likely that the data is already in a database -- a quick export and the vendor's done a great service for himself and his affiliates.

    --
    Richard Gaskin
    Fourth World Media Corporation
    Developer of WebMerge: Publish any database on any site
    ___________________________________________________________
    Ambassador@FourthWorld.com http://www.FourthWorld.com
    Tel: 323-225-3717 AIM: FourthWorldInc

  11. #11
    Super Sh!t Stirrer SSanf's Avatar
    Join Date
    January 18th, 2005
    Posts
    9,944
    I for sure don't want my merchant offering everyone who comes down the pike a data feed. I like having all that for myself even if I did have to do it all by hand.

    Mr.Merchant, if you do business in any way what-so-ever with parasites, your products will not be sold on my sites!!

    Farewell, CJ! I loved you when you were young and pure. I will try to remember you that way. Disclaimer: Comments are to be interpreted as opinion unless otherwise noted.

  12. #12
    ABW Ambassador buy_online's Avatar
    Join Date
    January 18th, 2005
    Location
    Richmond, VA
    Posts
    3,234
    "I for sure don't want my merchant offering everyone who comes down the pike a data feed."

    SSanf, have you talked to the merchant about this, and asked them not to share? If not, there's not much you can do about it...

    Fred

    You might just be a Redneck if - Birds are attracted to your beard...

  13. #13
    Affiliate Manager
    Join Date
    January 18th, 2005
    Location
    Los Angeles, California
    Posts
    1,913
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by SSanf:
    Mr.Merchant, if you do business in any way what-so-ever with parasites, your products will not be sold on my sites!!<HR></BLOCKQUOTE>

    A basic question for my own education: in this context, what is a "parasite"? Is there an FAQ or glossary that I can refer to?

    --
    Richard Gaskin
    Fourth World Media Corporation
    Developer of WebMerge: Publish any database on any site
    ___________________________________________________________
    Ambassador@FourthWorld.com http://www.FourthWorld.com
    Tel: 323-225-3717 AIM: FourthWorldInc

  14. #14
    ABW Founder Haiko de Poel, Jr.'s Avatar
    Join Date
    January 18th, 2005
    Location
    New York
    Posts
    21,609
    Richard,

    See www.parasiteware.com

    <font size="2" face="Verdana">Haiko


    The secret of success is constancy of purpose. ~ Disraeli
    </font></p>

  15. #15
    Affiliate Manager
    Join Date
    January 18th, 2005
    Location
    Los Angeles, California
    Posts
    1,913
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by Haiko:
    Richard,

    See http://www.parasiteware.com

    &lt;font size="2" face="Verdana"&gt;Haiko_


    The secret of success is constancy of purpose. ~ Disraeli_&lt;/font&gt;&lt;/p&gt;<HR></BLOCKQUOTE>

    Wow. Thanks.

    Isn't there a law against such blatant abuse, esp. in cases of commission theft?

    --
    Richard Gaskin
    Fourth World Media Corporation
    Developer of WebMerge: Publish any database on any site
    ___________________________________________________________
    Ambassador@FourthWorld.com http://www.FourthWorld.com
    Tel: 323-225-3717 AIM: FourthWorldInc

  16. #16
    Super Sh!t Stirrer SSanf's Avatar
    Join Date
    January 18th, 2005
    Posts
    9,944
    Well, I am confused. This merchant has three sites with similar but not identical merchandise. I would like to get a combined categorized data feed for all the merchandise and NOT have anyone else have it. Who can do this for me?

    I will pay a reasonable price for this depending on what reasonable is. Once I have the basic feed, I can add or delete new stuff myself.

    It must be understood that in accepting payment from me, this feed will not be shared in anyway with any other person nor will the creator of the feed sell it or offer to sell a similar feed to the merchant who might snap it up but is too lazy or busy or preoccupied to come up with it on his own. Also, the creator of the feed must agree not to use this feed to compete with me in any way what-so-ever. It will be my exclusive property. Additionally, by accepting the information about where the sites are, it is understood that this information will not be revealed to anyone else.

    Sorry to sound like such a snot about this, if I do.

    If anyone wants to tackle this, PLEASE, PM me.

    I know I can get permission from the merchant to do this for my own use. (He will want to buy it but I ain't selling)

    Mr.Merchant, if you do business in any way what-so-ever with parasites, your products will not be sold on my sites!!

    Farewell, CJ! I loved you when you were young and pure. I will try to remember you that way. Disclaimer: Comments are to be interpreted as opinion unless otherwise noted.

  17. #17
    Affiliate Manager
    Join Date
    January 18th, 2005
    Location
    Los Angeles, California
    Posts
    1,913
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by SSanf:
    Well, I am confused. This merchant has three sites with similar but not identical merchandise. I would like to get a combined categorized data feed for all the merchandise and NOT have anyone else have it. Who can do this for me?

    I will pay a reasonable price for this depending on what reasonable is.

    It must be understood that in accepting payment from me, this feed will not be shared in anyway with any other person nor will the creator of the feed sell it or offer to sell a similar feed to the merchant who might snap it up but is too lazy or busy or preoccupied to come up with it on his own. Also, the creator of the feed must agree not to use this feed to compete with me in any way what-so-ever. It will be my exclusive property. Additionally, by accepting the information about where the sites are, it is understood that this information will not be revealed to anyone else.
    <HR></BLOCKQUOTE>

    I could write one, but scrapers are a lot of work. At my normal rate it would cost betwen $500 and $1000. The non-compete aspect is pretty common for custom programming, but no matter who writes it there's a strong change it'll break the moment the layout changes.

    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>
    I know I can get permission from the merchant to do this for my own use. (He will want to buy it but I ain't selling)
    <HR></BLOCKQUOTE>

    Here's the part I don't understand: If the vendor has a catalog large enough to warrant scraping, chances are it's stored in a database (I can't imagine any retailer trying to manage inventory for more than a dozen products without a database). All databases allow export to at least a tab-delimited format -- why doesn't the vendor simply give you that?

    --
    Richard Gaskin
    Fourth World Media Corporation
    Developer of WebMerge: Publish any database on any site
    ___________________________________________________________
    Ambassador@FourthWorld.com http://www.FourthWorld.com
    Tel: 323-225-3717 AIM: FourthWorldInc

  18. #18
    Super Sh!t Stirrer SSanf's Avatar
    Join Date
    January 18th, 2005
    Posts
    9,944
    Did I say he is smart?

    He doesn't even know what I am talking about when I asked for a data base. When I tried to explain, he said it was too much work and wanted to hire me to do it for him! Maybe, I am using the wrong words. Maybe, I need to ask for a "spreadsheet" or something.

    Perhaps, I am just asking him the question wrong. Maybe, he does have it and I don't know how to get it out of him.

    I don't understand about it breaking. He very reliably tells us when products come and go so I would just keep it up to date based on that.

    It is just a spread sheet, right?

    (He's good hearted, though and I like him a lot. He's just not really in the marketing loop and sort of winging it.)

    Mr.Merchant, if you do business in any way what-so-ever with parasites, your products will not be sold on my sites!!

    Farewell, CJ! I loved you when you were young and pure. I will try to remember you that way. Disclaimer: Comments are to be interpreted as opinion unless otherwise noted.

    [This message was edited by SSanf on December 05, 2003 at 12:46 PM.]

  19. #19
    Just Lurking
    Join Date
    January 18th, 2005
    Posts
    1,263
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by SSanf:
    I don't understand about it breaking. He very reliably tells us when products come and go so I would just keep it up to date based on that.

    It is just a spread sheet, right?<HR></BLOCKQUOTE>To scrap a website a program has to written to pull out the useful data from the raw HTML. If that HTML changes significately then the scrapper may not work( broken ). The result of running the scraper program is as you say is a spread sheet, sort of.

    ------------------------------
    "A man is but the product of his thoughts. What he thinks, he becomes." -- Mahatma Gandhi

    [This message was edited by Buddha on December 05, 2003 at 02:24 PM.]

  20. #20
    Super Sh!t Stirrer SSanf's Avatar
    Join Date
    January 18th, 2005
    Posts
    9,944
    So if my merchant agrees to not change his site while it is being scrapped, then all would be well and we would get a good spread sheet, right? Depending on how long all this would take, I am sure he would agree.

    I called and talked to him. He does not have a problem with my getting a spreadsheet put together of his merchandise. He knows how I would use it and is all for anything that will help me build pages. You can talk to him if you want to.

    If, I can get my mentally lazy self together and learn to use webmerge along with this, I think I have an unusual opportunity.

    Mr.Merchant, if you do business in any way what-so-ever with parasites, your products will not be sold on my sites!!

    Farewell, CJ! I loved you when you were young and pure. I will try to remember you that way. Disclaimer: Comments are to be interpreted as opinion unless otherwise noted.

    [This message was edited by SSanf on December 05, 2003 at 03:15 PM.]

  21. #21
    Moderator MichaelColey's Avatar
    Join Date
    January 18th, 2005
    Location
    Mansfield, TX
    Posts
    16,232
    If you're using Perl or are willing to learn it, check out the WWW::Mechanize and HTML::TokeParser modules. Here's a short tutorial:

    http://www.perl.com/pub/a/2003/01/22/mechanize.html

    I've used these routines to scrape many sites.

    Michael Coley
    Amazing-Bargains.com

  22. #22
    Just Lurking
    Join Date
    January 18th, 2005
    Posts
    1,263
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by SSanf:
    So if my merchant agrees to not change his site while it is being scrapped, then all would be well and we would get a good spread sheet, right? Depending on how long all this would take, I am sure he would agree.<HR></BLOCKQUOTE>No, more like if your merchant agrees not to change the site after you get the scrapper. Not while your scrapping. Your going to use it more than once right? Think of it this way the scrapper is only good for the version of the site for which it was written and no other. If the merchant changes the header or adds a menu item or any of hundred other thing it could break the scrapper and your spread sheet would be full of garbage.

    Don't let me scare you it's not as bad as it sounds. He can add pages and products but if he decides the price looks better above the decription instead of below it the scrapper will need fixing.

    I've got a news collector that scraps a few dozen sites every morning, well over a megabyte or two of news links. Every six months, one or two needs fixing. The problem with merchant sites is the holidays. They change the sites for every dang gum holiday that comes along. So you should also consider the cost of maintenance of the scrapper too?

    I don't know your background Ssanf but the learning curves here are very steep for someone new. But worth it.

    ------------------------------
    "A man is but the product of his thoughts. What he thinks, he becomes." -- Mahatma Gandhi

  23. #23
    Moderator MichaelColey's Avatar
    Join Date
    January 18th, 2005
    Location
    Mansfield, TX
    Posts
    16,232
    This is a really good point. I've had to rewrite several of my scrapers after merchants redesigned their sites.

    Think of it this way: The way a scraper works is that it looks for specific, unique HTML and/or wording around the fields you're wanting to extract. If that HTML or the wording changes, the scraper won't find what it was looking for.

    Michael Coley
    Amazing-Bargains.com

  24. #24
    Super Sh!t Stirrer SSanf's Avatar
    Join Date
    January 18th, 2005
    Posts
    9,944
    Actually, I was just looking for a spread sheet of what he has now and figured I could keep it up to date. He is pretty good at notifying us of changes. I might even give him a mini-spread sheet just to record changes in. I believe he would do that.

    I think you guys have something much more complex in mind.

    Mr.Merchant, if you do business in any way what-so-ever with parasites, your products will not be sold on my sites!!

    Farewell, CJ! I loved you when you were young and pure. I will try to remember you that way. Disclaimer: Comments are to be interpreted as opinion unless otherwise noted.

  25. #25
    Just Lurking
    Join Date
    January 18th, 2005
    Posts
    1,263
    <BLOCKQUOTE class="ip-ubbcode-quote"><font size="-1">quote:</font><HR>Originally posted by SSanf:
    Actually, I was just looking for a spread sheet of what he has now and figured I could keep it up to date. He is pretty good at notifying us of changes. I might even give him a mini-spread sheet just to record changes in. I believe he would do that.

    I think you guys have something much more complex in mind.<HR></BLOCKQUOTE>Ok, so you really do only want it for one use. Well then you don't need to worry about maintenance then.

    ------------------------------
    "A man is but the product of his thoughts. What he thinks, he becomes." -- Mahatma Gandhi

  26. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Product Scraping - Customization
    By crunchyoyster in forum Blogging, Mobile and Social Media
    Replies: 8
    Last Post: June 21st, 2012, 09:59 AM
  2. CouponOver.com = Another site scraping coupon theif
    By shimmy in forum Unethical Affiliates
    Replies: 9
    Last Post: February 4th, 2010, 07:24 AM
  3. The Ethics Vs Site Scraping
    By John Jupp in forum Midnight Cafe'
    Replies: 11
    Last Post: November 18th, 2007, 07:08 PM
  4. Scraping a site for making my own data feed
    By SSanf in forum Programming / Datafeeds / Tools
    Replies: 26
    Last Post: May 8th, 2006, 10:13 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •