Results 1 to 6 of 6
  1. #1
    Newbie adsw's Avatar
    Join Date
    September 6th, 2006
    Posts
    45
    Using Datafeeds from Numerous merchants??
    Hi,

    Just been introduced to the world of Datafeeds and getting a bit confused!!

    Have imported 3 datafeeds from 3 different merchants for a consumer electronic related price comparison site. Each of the merchants sell roughly the same range of products, but they use differnet names for the names of the products. For example the same product might be described as a "Sony 42" plasma screen TV" in one feed and "42 sony plasma TV" in the other feed.
    This displays 2 different products.

    Just wondering what do people generally do in this case. Do you manually go through the feeds and change the product names?? Whats the maximum number of feeds would you inlcude in a site??

    Want to include a few merchants on my site using datafeeds. Is this crazy

    Hope this makes sense.

    Adrian

  2. #2
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    It sounds like you're trying to make a price match on the product names. This, as you can see, certainly won't work. You need to match based on UPC codes and model numbers. The tricky part with model numbers is you have to remove all the dashes and slashes from each model so you can normalize the data. Then you also need to make sure the manufacturer matches. So for example you might have ASD-352 by Sony and ASD352 by Sony Corp. They're both the same product, but you need to remove the - from the first model number and remove Corp from the 2nd Sony. Then you can make a proper match between them. UPC codes are a little easier because you don't need to match up the manufacturer, but unfortunately they aren't as prevalent as model numbers in datafeeds.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  3. #3
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Snib, this is a great tip, thanks. I've been pondering this problem for awhile, my matching success ratio between different datafeeds has been about 35-40% when I match against

    upc
    isbn
    product name + manufacturer
    mpn + manufacturer

    but if I create normalized columns for product_name, manufacturer and mpn, perhaps I can get my success ratio up. This could even speed up my cross-referencing, which currently takes 48 hours on non-normalized indexes.

    Anway, thanks for the tip,
    James

  4. #4
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Something I just discovered yesterday was some feeds have excessive leading zeros in their UPC codes. I had to come up with a fix that extracts the last 12 characters of the UPC. In some cases the UPC is 13 characters while in other cases it's 14, so you need to come up with a solution that ignores the first one or two characters.

    Your 48 hour matching process sounds quite excessive. I maintain all my past matches so I only need to match up new products. By doing this it only takes a few minutes to match up the unmatched products.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  5. #5
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Yes, I already have the leading zero trimming, along with normalization from 8 digit UPCA-E barcodes to the 12 digit UPC-A codes. Figuring out the UPC spec was fun (*not*). I am not currently considering EAN numbers, but I probably should...

    So I am cross referencing ~ 8.5 million prices against about 7 million products that are broken up into 20 categories (e.g. databases) running at most 4 queries each. So what kills me is the queries across 20 db's. I will try lumping all my normalized data into one table with 7 million rows to see what happens. FYI, I've found that Unioning multiple queries together works much better than "OR". e.g.

    select id from tablename where upc='xxx' UNION DISTINCT select id fromtablename where isbn='yyy'

    about your caching of results... absolutely, as soon as I get better than 50% hit ratio.

    I'm running all of this on a quad proc 64 bit XEON with 4 GB in RAM. All queries are in stored procedures and I'm am currently CPU bound, not IO bound.

  6. #6
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    I think putting everything in a single table will help. That's what I do and it works great.

    So it sounds like you're extracting the match criteria first to use that as a lead-in for product matching. I have a slightly different method. I run a join against my product table and pull out matching pairs. So I'll grab two products that have the same UPC where at least one wasn't matched previously. Then I'll create a match in my criteria table or use the existing row if it already exists. This seems to work very well.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  7. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Relationship deactivated by numerous merchants today
    By joyaz in forum Arizona Affiliate Tax
    Replies: 0
    Last Post: August 28th, 2012, 10:18 AM
  2. what CJ merchants have datafeeds?
    By raveon in forum Commission Junction - CJ
    Replies: 3
    Last Post: September 6th, 2010, 10:30 PM
  3. DataFeeds and New Merchants
    By janew in forum ShareASale - SAS
    Replies: 4
    Last Post: November 25th, 2005, 03:04 PM
  4. Will the datafeeds in the merchants datafeeds thread track my commissions?
    By john9245 in forum Programming / Datafeeds / Tools
    Replies: 5
    Last Post: March 29th, 2005, 09:42 AM
  5. Merchants with Datafeeds available
    By ahugedeal in forum Programming / Datafeeds / Tools
    Replies: 43
    Last Post: February 17th, 2004, 05:10 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •