Results 1 to 7 of 7
  1. #1
    Newbie
    Join Date
    July 7th, 2011
    Location
    NYC
    Posts
    7
    Question HELP! Affilliate Data feeds Don't Have All Data Needed?!
    i've been building a site with a programmer, we've pulled feeds from cj/linkshare/google but tons of other sites everywhere have list price, sale price, and sometimes even shipping cost on their product pages BUT the raw feeds from these providers don't deliver that info, so where do they get those datapoints? crawlers? formetocoupon like feed services? this is the last piece of the puzzle i need to complete my site!

    any help or advice is appreciated...thanks!

  2. #2
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    Incomplete data, dissimilar data, inaccurate data, misleading data -- these are all standard contents in merchant datafeeds.

    Shipping and tax rates are very, very rarely included on a merchant's product pages -- often they vary depending on delivery location, weight, and price, and thus are only displayed after a consumer is inside the shopping cart (which is often a difficult place to scrape useful data), and often only on the second or third page of the cart process.

    I don't operate a price comparison site, but in some of my directory sites I sometimes include shipping & handling costs if I notice that they seem unreasonably high, relative to the product's price or weight. (This is usually for some smaller book publishers who really prefer not to accept small orders.) For most products, I try to exclude any merchants specifying a shipping delay more than a few days (thus, I exclude products which merchants say "will ship in 4 to 6 weeks," unless there is some special reason why that makes sense (customized items or items which the merchant doesn't stock but must order from an overseas warehouse).

    Most price-comparison sites that attempt to include shipping or tax in their pricing combine data available from merchants with manually-entered formulas that their staff create (sometimes from data furnished reluctantly by the merchant, and sometimes data derived from many dozens or hundreds of shopping-cart visits).

    Some data, such as product weight, may be included in one merchant's datafeed but not others; using the weight data from one merchant, together with a product-matching algorithm, may allow the estimation of shipping charges for another merchant.

    It's hard enough to keep basic price data current; it's harder to keep shipping & tax charges current, which is why so many price-comparison sites don't bother to try, and why others only provide the data for merchants who include it in their datafeeds.

    It's quite routine to see a product listed on a price-comparison site at a price of $85 plus $8 shipping (for example), and then see a price of $92 plus $13 shipping in the merchant's shopping cart. That's a bad experience for all parties, and merchants who intentionally provide inaccurate data are soon excluded completely from price-comprison sites -- which is why you experience these problems more often on low-budget price-comparison sites that don't check for accuracy nor provide a channel for consumer complaints.

    Computing sales tax is much more complex, since the rates vary depending on product types in each state, and also by zip code. I've certainly seen price-comparison sites that ask me for my zip code (or identify it from my IP address) [/I]but then compute the tax based on my state's basic sales tax rate, not the higher rate applicable with the surcharges imposed by my county and transit district and charged by the merchant. Some price-comparison sites don't recognize that sales tax applies in some states only to "tangible" products (here in California, a sales tax applies if I buy software that's sent in a box, but not if I buy the exact same software as a digital download, at the same price -- this is a case where "instant gratification" actually saves money).

    I assume that some larger price-comparison services arrange for merchants to provide the required data (for example, by including an XML file that specifies shipping rates based on weight or price, and another file identifying the states for which the merchant collects sales tax). These files might be available to other price-comparison sites willing to accept the same data format.
    Last edited by markwelch; July 8th, 2011 at 12:28 PM.

  3. #3
    Newbie
    Join Date
    July 7th, 2011
    Location
    NYC
    Posts
    7
    Quote Originally Posted by markwelch View Post
    Incomplete data, dissimilar data, inaccurate data, misleading data -- these are all standard contents in merchant datafeeds.

    ...

    I assume that some larger price-comparison services arrange for merchants to provide the required data (for example, by including an XML file that specifies shipping rates based on weight or price, and another file identifying the states for which the merchant collects sales tax). These files might be available to other price-comparison sites willing to accept the same data format.
    mark - thanks so much for the feedback...

    i agree, and i think the shipping price is the least important datapoint i'm concerned with - i'm MUCH more interested in how to get the list price to display alongside the current/sale price...

    if none of these big 3 networks give it in the feed, where are others getting it from? mechanical turks? crawlers? there is a magic or illusion here i cannot figure out, and its making me bananas trying to solve this. crawlers seem logical, though arguably grey-hat or not completely legal?

  4. #4
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    Over the past week, I've been doing a lot of crawling to gather data from merchant web sites (for merchants who don't provide datafeeds at all, and who have given me permission* to scrape their sites). I've been pleasantly surprised at how easy it is to isolate and extract certain pieces of data, especially price data.

    I'm using a software tool called TextPipe Pro** (from DataMystic), which can quickly parse through files (on my local hard disk; they sell a separate product to do scraping, but I use own PHP scraping scripts). I can define a set of filters for each merchant, which can take a 20K or 50K product-description page and quickly extract data and combine it into a single file which I then import into Excel (because there's always some manual clean-up required).

    Some merchants actually provide XML or other structured-format data to clearly identify certain data points; others use style or id tags which clearly identify certain data points, and the rest use a standard template that makes it relatively easy to identify and extract data (for example, the amount that appears after the text "Regular Price:" is the list price, and the amount that appears after the text "Your Price:" is the actual price). But there are some tricks: some merchants actually show three prices (list, regular, and sale), and some merchants have different templates for different product types, or for products that are on sale.

    Here is the absolute "best case" for how I found some key data embedded in the header of a product page from a site I just scraped (including data that isn't visible on the HTML page, but still excluding some data I wanted to scrape [product image URLs, for example]):

    Code:
    <meta name="object.type" content="book" />
    <meta name="book.title" content="Ralph Masiello's Ocean Drawing Book" />
    <meta name="book.author" content="Masiello, Ralph" />
    <meta name="book.isbn" content="9781607341093" />
    <meta name="book.price" content="6.99" />
    <meta name="book.pages" content="48" />
    */ It's generally a good idea to ask merchants for permission before "scraping" data from their sites. (Note that some merchants expressly prohibit scraping or crawling the merchant's site -- and violation of that policy is likely to result in immediate termination of the advertising relationship.) Specifically ask them whether you should include a crawl-delay (NN seconds between page requests) or avoid specific times of day (peak traffic hours AND perhaps other times when housekeeping [backups, report generation] may be happening on the server). A crawler or scraper should ONLY extract HTML pages, not any scripts, style-sheets, or images. I had the horrific experience this past Saturday night of seeing a merchant's web server crash (while I was scraping it) and stay offline for several hours; I was relieved on Tuesday morning when the merchant told me the outage was due to a hardware failure and not because of my activity. (Remember, catastrophic failures always happen at the least convenient time -- here, at 11:00pm on Saturday night of a 3-day holiday weekend.)

    **/ Do a search here on ABW for "TextPipe" to see the comments from other users that led me to spend $395 for this software.
    Last edited by markwelch; July 8th, 2011 at 12:57 PM.

  5. Thanks From:

  6. #5
    Newbie
    Join Date
    July 7th, 2011
    Location
    NYC
    Posts
    7
    Quote Originally Posted by markwelch View Post
    Over the past week, I've been doing a lot of crawling to gather data from merchant web sites (for merchants who don't provide datafeeds at all, and who have given me permission* to scrape their sites). I've been pleasantly surprised at how easy it is to isolate and extract certain pieces of data, especially price data.
    is crawling/scraping very common? i want to launch first with A and B level/tier merchants (aka macys, sears, kmart, buy.com, ebay, footlocker), but will they allow or require permission to crawl? i want my site info to be as accurate as possible so my site/product is reliable for visitors to maintain site credibility and loyalty.

    i seem to also see alot of mention of marketers using wordpress to build their site, and many plugins from feed aggregator services and whatnot. i originally looked at formetocoupon, but price was prohibitive and we decided to code ourselves, though the learning curve with the feeds and data credibility is in question straight from the networks... so again, if crawling is commonplace, then perhaps that is the best road to go to gather accurate and complete data for my site.

    thoughts?

  7. #6
    Newbie
    Join Date
    July 7th, 2011
    Location
    NYC
    Posts
    7
    Question
    i read the T&Cs for macys in linkshare and it specifically states no scraping...

    how do sites like dealnews and dealsucker and dealsplus get the sale prices AND list prices if they do not come thru in affilliate network feeds?


  8. #7
    Newbie
    Join Date
    July 7th, 2011
    Location
    NYC
    Posts
    7
    Quote Originally Posted by garyaggregator View Post
    i read the T&Cs for macys in linkshare and it specifically states no scraping...

    how do sites like dealnews and dealsucker and dealsplus get the sale prices AND list prices if they do not come thru in affilliate network feeds?

    BUMP

  9. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Replies: 2
    Last Post: October 9th, 2010, 08:59 PM
  2. Data Feeds / Product Feeds
    By doogie18 in forum Commission Junction - CJ
    Replies: 6
    Last Post: December 29th, 2008, 06:23 PM
  3. Data Feeds
    By ucbadam in forum ShareASale - SAS
    Replies: 3
    Last Post: September 2nd, 2005, 06:21 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •