Results 1 to 17 of 17
  1. #1
    Affiliate Manager
    Join Date
    January 18th, 2005
    Location
    Los Angeles, California
    Posts
    1,913
    Working with large feeds
    Now that v2.5 is in testing, I'm focusing my efforts on WebMerge 3.0. Among other new features it'll provide ways for the user to modify the source data feed file. I've built a table object for this, and it seems to hold up well enough with large feeds but does take a bit of time up front for it to load and render.

    For example, this morning I was working with a feed from SAS, which decompresses to 95 MB and contains about 250,000 records. Once loaded, my table seems to handle scrolling surprisingly well, so moving around in the data is a breeze. But the load time is more than half a minute, and given the work that needs to be done under the hood to display it I'm not sure I can optmize that much.

    So my question for you folks is:

    When you edit large feeds like this, what software do you use and how gracefully does it perform?

    Given the relatively inefficient buffering scheme used in Microsoft Excel, I can't imagine it's a snappy performer.

    If you use Access, Open Office, or other DB, do you display all records in a list when you're working? How well do those programs hold up in terms of load time and scrolling around after it's loaded?

    Thanks in advance -
    Richard Gaskin
    Developer of WebMerge: Publish any data feed on any site
    http://www.fourthworld.com

  2. #2
    ABW Ambassador PatrickAllmond's Avatar
    Join Date
    September 20th, 2005
    Location
    OKC
    Posts
    1,219
    I don't use your product (I am a MySQL/PHP person myself) but I preview any feeds with notepad++ before I decide how to handle them. It can easily handle thousands of records. I've never had a problem with 50000+ line files
    ---
    This response was masterly crafted via the fingers of Patrick Allmond who believe you should StopDoingNothing starting today.
    ---
    Focus Consulting is where I roll | Follow @patrickallmond on Twitter
    Search Engine Marketing | Search Engine Optimization | Social Media | Online Video

  3. #3
    Affiliate Manager
    Join Date
    January 18th, 2005
    Location
    Los Angeles, California
    Posts
    1,913
    Thanks for the feedback. Good to see that Notepad handles thousands of lines well. I wonder how it holds up with hundreds of thousands....hmmm...I'll go do some testing with it....
    Richard Gaskin
    Developer of WebMerge: Publish any data feed on any site
    http://www.fourthworld.com

  4. #4
    ABW Ambassador PatrickAllmond's Avatar
    Join Date
    September 20th, 2005
    Location
    OKC
    Posts
    1,219
    I just want to make sure you saw I was referring to notepad++, not notepad.

    http://notepad-plus.sourceforge.net/uk/site.htm
    ---
    This response was masterly crafted via the fingers of Patrick Allmond who believe you should StopDoingNothing starting today.
    ---
    Focus Consulting is where I roll | Follow @patrickallmond on Twitter
    Search Engine Marketing | Search Engine Optimization | Social Media | Online Video

  5. #5
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    I use PHP to preview my datafeeds. I display one record at a time through a web interface and can skip to any record in the feed. It's pretty quick.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  6. #6
    Resident Genius and Staunch Capitalist Leader's Avatar
    Join Date
    January 18th, 2005
    Location
    Florida
    Posts
    12,817
    For a WM site, my workflow goes like this:

    Get the big mega CJ feed
    Enter it into a MySQL table
    Extract the data I want (usually all of one merchant's stuff) into a tab-delimited.txt. (Any changes needed are made before this point.)

    Use the .txt as the datasource for WM.
    Upload resulting pages to server as a single zip file. Unzip. See what came of it. If results are satisfactory, leave it alone...if not, cuss and debug.

    I don't preview the records unless I get some really, really, wonky results that force me to try to figure out what's wrong. I have a standard set of changes I make via MySQL before generating the txt file, so the most common unwanted junk is removed without me ever needing to physically look at it.
    There is no knowledge that is not power. ~Hemingway

  7. #7
    Full Member
    Join Date
    October 22nd, 2006
    Posts
    200
    Quote Originally Posted by Snib
    I use PHP to preview my datafeeds. I display one record at a time through a web interface and can skip to any record in the feed. It's pretty quick.
    If you have a feed with 50000 records and you can load and evaluate each one in 3 seconds thats 150000 seconds or 2500 minutes say you will need to work for about 42 hours (if you don't get bored first) to complete one feed.

    If the merchant updates the feed on a weekly basis it looks like you might have afull time job.

  8. #8
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by Donk
    If you have a feed with 50000 records and you can load and evaluate each one in 3 seconds thats 150000 seconds or 2500 minutes say you will need to work for about 42 hours (if you don't get bored first) to complete one feed.

    If the merchant updates the feed on a weekly basis it looks like you might have afull time job.
    Why evaluate each product? I usually only look at a small handful to make sure prices are accurate and that the feed has sufficient information. Once I map it to my database my import scripts take care of the day-to-day price and inventory updates. I don't look at the feed again unless something goes wrong.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  9. #9
    Full Member iolaire's Avatar
    Join Date
    October 3rd, 2006
    Location
    Arlington, VA
    Posts
    229
    To open files of any size - up to a few gigs if you have the ram I use this at work:
    http://www.lancs.ac.uk/staff/steveb/...fe/default.htm
    Programmer's File Editor

  10. #10
    Affiliate Manager
    Join Date
    January 18th, 2005
    Location
    Los Angeles, California
    Posts
    1,913
    Quote Originally Posted by iolaire
    To open files of any size - up to a few gigs if you have the ram...
    It's the "if you have the RAM" part that makes it tough. WebMerge can handle files up 4GB, but very few people have the RAM needed to work with files that size (nor, thankfully, datafeeds with that many products).
    Richard Gaskin
    Developer of WebMerge: Publish any data feed on any site
    http://www.fourthworld.com

  11. #11
    Affiliate Manager
    Join Date
    January 18th, 2005
    Location
    Los Angeles, California
    Posts
    1,913
    Quote Originally Posted by Snib
    Why evaluate each product?
    As I was working through some alternative designs yesterday, I came to the same question.

    While it's a nice technical flourish to be able to scroll data more gracefully than the best of Microsoft's products, is it really useful to display ALL of the data from a feed at once?

    Preview of a few thousand records at a time for spot-checking may be sufficient, and could be done with lightning fast load times.

    I would be interested in hearing from anyone who would prefer to see the entire feed at once, and why.

    Always up for learning more about how people work with their data....
    Richard Gaskin
    Developer of WebMerge: Publish any data feed on any site
    http://www.fourthworld.com

  12. #12
    Affiliate Manager
    Join Date
    January 18th, 2005
    Location
    Los Angeles, California
    Posts
    1,913
    Quote Originally Posted by Leader
    Get the big mega CJ feed
    Enter it into a MySQL table
    Extract the data I want (usually all of one merchant's stuff) into a tab-delimited.txt. (Any changes needed are made before this point.)

    Use the .txt as the datasource for WM.
    Very helpful, Leader. Sounds like it would be handy if I could build the first few steps (load into MySQL and extract the records of interest) directly into WM itself. I should be able to save you those extra steps with WebMerge 3.0.

    On the other thread here, "Deep question about hierarchies", do your sites ever go more than 4 levels deep? You manage a lot of data, and quite successfully from what I can tell, so your input is always helpful and appreciated. Thanks in advance -
    Richard Gaskin
    Developer of WebMerge: Publish any data feed on any site
    http://www.fourthworld.com

  13. #13
    Newbie
    Join Date
    August 22nd, 2007
    Posts
    5
    Using FileMakerPro 9
    Hi Richard,

    I'm a bit of a newbie with webMerge, but I've started using Filemaker Pro and it does a good job handling and manipulation data. I can create a default DB with the data cleansing baked into the fields and executed upon import.

    I then export to a .mer file. WebMerge has no problem handling 300K+ records that way (.cvs files were choking).

    One thing I'd like to see on the generation side - automatic creation and insertion to a zip file. That way I'm handing one large file after WM runs.

    Thanks & love your product!

  14. #14
    Newbie
    Join Date
    July 11th, 2008
    Location
    Eastern USA
    Posts
    7
    Newbie using Excel
    I'm weighing in as a newbie to affiliate marketing, datafeeds and webmerge. (Someone has to do it!) I have been at this for one week now...

    First, let me say, so far, my record sets are very, very small: 250 records, currently.

    Here's what I've done (and believe me, I am SO open to advice):

    1. Initially, I took a look at the feed to understand what fields were present and from that, built four html template files and built a template file that builds a php link cloaking file. I am using Dreamweaver for my html and css editing.

    2. In Excel, I created a "working file." This is a worksheet that mirrored the initial datafeed fields to which I added a few of my own (like Category, CategoryToFilename, AddDate).

    I made sure that these "new" fields were shoved off to one side, so I could always cut and paste data from a new (daily) datafeed into the existing data (the working file) without overwriting data or separating new fields. This makes the fields that I did create easier to spot as well.

    When I did this, I did not understand whether I would be getting complete data in datafeeds or just updates in datafeeds. I'm still not certain about this either, but given the small record set, I can use Excel to remove duplicates based on the LinkID field (assuming, of course, that LinkID is unique and that if a duplicate LinkID appears, it represents an exact duplicate record - so far, that holds true.)


    Now, on a daily basis, I do this:

    1. Get the new feed via email. I signed up for the "get your feed via FTP" option but so far, no datafeed on the server and no response I sent to the the Affiliate Network. So, for now, it arrives via email until I can find the time to email each and every Affiliate Manager with my FTP info. *grumble*


    2. I look at the datafeed in Excel. I don't go through with a fine-toothed comb but I look at it to spot anomalies or differences. Given that my record-set is very small, this is easy. But if the record set was very large, I would not be able to do this by hand. Spot checking would be the best I could do.

    3. If the record set is ok, I highlight, copy and paste it into my "working file."

    4. Since I created my own Categories, I assign Categories and CategoryToFilename data to each record. Then I insert today's date into the AddDate field for each record.

    5. I sort the data based on Expiration Date and remove records that have expired information.

    This is where I must be careful because it is possible to delete Categories in my datafile and I do not want to do that.

    Each Category in the data file is "mapped" to a sub-directory on the web server. But, more importantly, if I deleted a Category in the datafile, I wouldn't know that it was not in the datafile (because I don't keep a hardcoded list in my head of the Categories I have created) so when a new record - say, the following day - appears that belongs in a Category, I don't want to go sifting through my file to remember what the name of the Category I deleted was. Convoluted, I know.

    So, for now, I just make sure I do not delete a Category from the data file.

    6. I remove duplicate records based on LinkID.

    7. I save the file as a text file and close Excel.

    8. I open WebMerge, load one settings file that I set up to autorun 4 other settings files and voila, website files.

    9. Using CuteFTP, I load a saved cue, fire and forget.

    10. When it's done uploading the files to the server, I spot check the site.

    What I haven't been able to automate with WebMerge is the creation of multiple index.html files that go into separate sub-directories where the sub-directory name is based on the CategoryToFilename field.

    Is this possible with one settings file?

    *phew*

    Thank you!

  15. #15
    Affiliate Manager
    Join Date
    January 18th, 2005
    Location
    Los Angeles, California
    Posts
    1,913
    Quote Originally Posted by Sentinel
    What I haven't been able to automate with WebMerge is the creation of multiple index.html files that go into separate sub-directories where the sub-directory name is based on the CategoryToFilename field.

    Is this possible with one settings file?
    Yes and no. This is, in versions through v2.5 you'll still need a separate settings file for each tier in your hierarchy. But you can queue up any number of settings files to run automatically after the current one is finished (see the bottom of the screen on the Generation tab), so in effect you have pretty much the same convenience when running it.
    Richard Gaskin
    Developer of WebMerge: Publish any data feed on any site
    http://www.fourthworld.com

  16. #16
    Newbie
    Join Date
    July 11th, 2008
    Location
    Eastern USA
    Posts
    7
    Richard,

    Thank you for your reply.

    The ability to run multiple settings files in sequence is a feature I discovered last week. It is truly a time-saver!

    I will give thought to using this to generate the index.html files for each sub-directory. With a little thought, I think I can find an elegant way to do so.

    Thank you.

  17. #17
    Member tsmgroup2's Avatar
    Join Date
    January 18th, 2005
    Location
    New Providence, PA USA
    Posts
    155
    Question Linux mousepad handling
    Quote Originally Posted by FourthWorld View Post
    It's the "if you have the RAM" part that makes it tough. WebMerge can handle files up 4GB, but very few people have the RAM needed to work with files that size (nor, thankfully, datafeeds with that many products).
    I've had no problem except for maybe a little 'wait time' for any feed in excess of 512 MB on my system. Linux seems to handle these big feeds pretty well.

    I do have a problem with getting Webmerge 2.6b1 to run and process a few feeds at size 1.1 GB. Is there something wrong with the software, hardware, or me? Thanks
    I could really use your help on this one Richard. Thanks.

    PS My desktop has 4 GB installed in it, but am not 100 % if it is all working... Ubuntu has a way of reserving some...
    Last edited by tsmgroup2; July 3rd, 2010 at 01:37 PM. Reason: forgot something.
    Mark (Satchel)
    Webmaster / Sales Manager
    [url]www.tsmgroup2.biz[/url]

  18. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Programs to Open Very Large Feeds?
    By mayfly in forum Programming / Datafeeds / Tools
    Replies: 19
    Last Post: March 17th, 2012, 03:06 PM
  2. RSS feeds stopped working
    By glittered in forum Rakuten LinkShare - LS
    Replies: 2
    Last Post: December 26th, 2008, 02:20 AM
  3. GoldenCAN Feeds Not Working
    By rematt in forum GoldenCAN
    Replies: 11
    Last Post: April 5th, 2007, 09:12 AM
  4. Split up the directories when working with large datafeeds?
    By Nintendo in forum WebMerge (Fourthworld.com)
    Replies: 1
    Last Post: August 15th, 2004, 05:32 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •