Page 1 of 2 12 LastLast
Results 1 to 25 of 31
  1. #1
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    Quitting the datafeed project (again)
    Over the past three years, I've spent a lot of time working to devise a "datafeed management" software solution, along with content-management and related tools, that would allow me to more easily create and update original web site content pages that would include product information.

    Several times, I've become discouraged and chose to put the project on "hold" while I did other work, but I've always been drawn back to this project. I strongly believe that with this tool, I could spend my time "doing what I do well," creating and managing original content on web sites that could drive highly-qualified traffic to merchants. But for the past several months, I've worked full-time on this project and nothing else.

    But I've faced a number of obstacles.

    The wide scope of the project, and the large number of relatively discrete elements, require thousands of decisions, each of which impacts the rest of the project. Every mistake forces a huge amount of back-tracking.

    One of my biggest problems has been my own far-outdated programming skills, and my inability to quickly learn (and retain) knowledge about LAMP programming (Linux, Apache, MySQL, PHP). I've tried several times to hire programmers, but after spending many thousands of dollars I'm now confident that I'm not able to competently select and hire "the right programmers" from among the many who respond to my bid solicitations and requests-for-proposals.

    This week, I spent some time reflecting and realized that after about five months of full-time effort this year, I've still not even implemented many of the features or functionality of Datafeed Studio, which I tried and concluded was far too limited to meet my needs. (My current tool-in-development does include some features and functions that aren't in Datafeed Studio, but it's not remotely ready for "real use.")

    I've made a lot of progress, but I must be realistic: at the pace I've been progressing, it is unlikely that I will have a working solution within a year (and perhaps not within several years). It is simply not reasonable for me to ask my wife to wait a year, or two, or five, while I continue to struggle on a task that might simply be impossible for me to complete alone.

    As I discussed this with my wife yesterday, I repeated several times that I strongly believe that this project might proceed much more quickly and effectively if I had "collaborators," but unfortunately my repeated efforts to recruit people to work on the project have failed to produce useful outcomes. (Each attempt has forced me to create a new, more detailed project specification -- my current specification is far more detailed than the RFPs I posted last year.)

    It is time, once again, for me to put my project back on the shelf and look for "real work" that can pay the bills.

  2. #2
    ABW Ambassador writerguy's Avatar
    Join Date
    January 17th, 2005
    Location
    Springfield, Missouri, USA
    Posts
    3,248
    Mark, having read your posts here in recent years about this project, I truly hope things work out well for you. I know you and your wife have put a lot of "heart and soul," as well as time and treasure, into this project.

    I trust you'll find "real work" that's satisfactory -- and some day be able to come back to this project and get it done the way you dream to get it done! All the best to you.
    Generate more fake news.

  3. #3
    The Seal of Aproval rematt's Avatar
    Join Date
    November 19th, 2006
    Location
    The Windy City
    Posts
    4,140
    Mark, sorry to see you abandon your project, although I can certainly understand. Sometimes you just need to step away for a while and catch your breath. Trying to "normalize" the infinite number of variations in merchant and network feeds would be a pretty daunting task even for the most proficient and experienced programmer.

    Good luck as you reestablish your more mundane affiliate pursuits. The bright side is that maybe you'll have more time to hang out here now .

    -rematt
    "I know that you believe you understand what you think I said, but I'm not sure you realize that what you heard is not what I meant." - Richard Nixon

  4. #4
    ABW Ambassador ladidah's Avatar
    Join Date
    October 15th, 2007
    Location
    MA
    Posts
    1,888
    What Gary said.

    I am sure that this time has not been wasted and that you have learned a lot. Some times it is better to put things aside for a while, get a fresh look on things, and perhaps if you so desire in the future, revisit with new perspective and *aha moments*. Not to mention, energy, money and gusto.

    Onward and upward!

  5. #5
    OPM and Moderator Chuck Hamrick's Avatar
    Join Date
    April 5th, 2005
    Location
    Park City Utah
    Posts
    16,646
    Mark, is this a commercial product you are trying to develop? Or is this just an interface to manage all your datafeed sites? If its a personal endeavor then why not automate what you can and realize that the rest is a manual process.

    I know several programmers who could help as they automate websites as a day job in LAMP using php and rails. They would want to charge an hourly rate that would probably bankrupt you.

  6. #6
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Sounds like you've just had some really bad luck with developers. I don't imagine the ones you'll find on general freelance sites will have much experience with this sort of thing. Also, datafeeds are very resource intensive, so if you don't have strong enough servers running them you'll hit hurdles right away. It's certainly best to focus on a niche rather than competing with the major price comparison sites. A common problem with running datafeed sites is people tend to combine many datafeeds across numerous niches, quickly using up their server resources while providing no specialized service to customers.

    As a good rule of thumb I'd start with 50-100 datafeeds on a single category like Shoes, Women's Clothing, Power Tools, or Kitchenware. Be selective and only take the pertinent categories from each datafeed. You usually don't want to use the entire feed, especially when working with department stores.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  7. #7
    Merchant & ABW Ambassador
    Join Date
    May 31st, 2006
    Location
    Houston TX
    Posts
    4,731
    Quote Originally Posted by Snib
    As a good rule of thumb I'd start with 50-100 datafeeds on a single category like Shoes, Women's Clothing, Power Tools, or Kitchenware. Be selective and only take the pertinent categories from each datafeed. You usually don't want to use the entire feed, especially when working with department stores.

    - Scott
    When you mention, 50-100 datafeed, would that be 50-100 stores as usually, stores will set just one datafeed up with like 10-20 categories. if you are looking at a large retailer, that might be a large datafeed.

    I am asking cos if that is the case, you will need a dedicated server.

  8. #8
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by Eric Ewe
    When you mention, 50-100 datafeed, would that be 50-100 stores as usually, stores will set just one datafeed up with like 10-20 categories. if you are looking at a large retailer, that might be a large datafeed.

    I am asking cos if that is the case, you will need a dedicated server.
    Generally stores just have a single datafeed, but to clarify I do mean 50-100 stores. Any less and you really aren't providing enough value to the customer. And definitely, you need a dedicated server.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  9. #9
    Merchant & ABW Ambassador
    Join Date
    May 31st, 2006
    Location
    Houston TX
    Posts
    4,731
    k. never thought that u need 50-100 stores. that is a lot to chew on.

  10. #10
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by Eric Ewe
    k. never thought that u need 50-100 stores. that is a lot to chew on.
    It really depends on the niche, but it's a good rule of thumb to pick a niche that could accommodate that goal. We've got to consider that we're up against the big boys here like Shopping.com, Shopzilla and Pricegrabber. It's unreasonable for any single individual to compete with them directly. The best way to do it is to pick a small segment and do just that category better than them.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  11. #11
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitlán, Edo. de México
    Posts
    1,725
    I've always been interested in your project since before I even had a clue how to program in PHP using MySQL for databases and I wished you well with it.

    In my own experience over this last year, I've discovered that it is much better to start with small achievable goals and once accomplished, make a new goal and expand from that. Even those small simple goals are not easy for a newbie (like me) to reach. As one progresses, the additional tasks and new goals don't seem as difficult since each goal completed lets the programmer learn how to do something.

    While I have needed to rewrite different parts of my scripts several times in the process, I believe that if one year ago I had set the goal to have what little I have now, I would have given up on it month's ago with nothing to show for the effort.

    Instead of depending on outside programmers for everything, it really is time to learn php. Even if you outsource parts of your system, you really should be in charge of it since you want it so exact which means you'll need to be able to read the code and make changes to it.

    Quote Originally Posted by markwelch
    ...
    But I've faced a number of obstacles.

    The wide scope of the project, and the large number of relatively discrete elements, require thousands of decisions, each of which impacts the rest of the project. Every mistake forces a huge amount of back-tracking.
    ...
    One of my biggest problems has been my own far-outdated programming skills, and my inability to quickly learn (and retain) knowledge about LAMP programming (Linux, Apache, MySQL, PHP). I've tried several times to hire programmers, but after spending many thousands of dollars I'm now confident that I'm not able to competently select and hire "the right programmers" from among the many who respond to my bid solicitations and requests-for-proposals.

    ... (Each attempt has forced me to create a new, more detailed project specification -- my current specification is far more detailed than the RFPs I posted last year.)

  12. #12
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    Chuck wrote: > "Mark, is this a commercial product you are trying to develop? Or is this just an interface to manage all your datafeed sites? If its a personal endeavor then why not automate what you can and realize that the rest is a manual process." <

    This is a system for my own use, not a commercial product.

    The problem is that without having most of the pieces in place, all the work I do today will become stale in a few months, and the update cycle becomes overwhelming. I've done this before, many times -- create a site, striving to make it relatively easy to update, but a year later it's a hopeless mess.

    Snib wrote: > "As a good rule of thumb I'd start with 50-100 datafeeds on a single category . . . Be selective and only take the pertinent categories from each datafeed." <

    I understand the concept of focus -- but here, the same effort is required to import 50 datafeeds from 6 networks as it takes to import 500 or 1,500 datafeeds from those same networks. And as you know, most merchants' categorization is troublesome; a huge amount of cleanup work is required (either manual or automated).

    There is a huge trade-off between

    Snib wrote: > "We've got to consider that we're up against the big boys here like Shopping.com, Shopzilla and Pricegrabber. It's unreasonable for any single individual to compete with them directly. The best way to do it is to pick a small segment and do just that category better than them." <

    Perhaps I've created some confusion because I posted a recent query about "parameters for search-results display," in which my examples queried against the full 800-merchant product test database. If I focused on a single product category with only 12 or 50 merchants, then the issues involved in that particular effort would have been more focused -- but then later I would need to re-visit the problem once additional product niches and merchant feeds were introduced.

    I'm not trying to create a price-comparison engine, nor to compete with those companies. You're right about the issue of picking segments.

    But again, the effort is nearly the same to build an infrastructure to import "everything," as to implement a "partial solution." (In any programming project, the effort required to create a "single use solution" is usually about 90% of the effort required to make a "dual-use solution," which in turn is about 90% of the effort required for a general multi-use solution.)

    If I try to create a limited system that focuses only on a single product category, I'll still need to implement 90% of the same filtering and updating functionality, and by limiting the project scope, I'd make it much more difficult to extend to a second or third category.

    My current system imports about 800 datafeeds from ShareASale and Avantlink, and then for each specific site I use filters that query only against a small subset of the data.

    And of course, my entire plan was to start with a single category -- currently, the site for my "test category" draws from only 6 merchants' datafeeds.

    Snib also wrote: > "A common problem with running datafeed sites is people tend to combine many datafeeds across numerous niches, quickly using up their server resources while providing no specialized service to customers." <

    Both Snib and Eric noted: > "you will need a dedicated server." <

    While server resources are certainly an issue during testing (since I'm using a $50-per-month VPS account for this purpose), I certainly intend to use a dedicated server (or several) for late-stage testing and for the live sites. During some phases of testing, I've limited the amount of data being imported (first, by limiting the number of feeds included; later, by limiting the maximum number of records imported from each datafeed).

  13. #13
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Quote Originally Posted by markwelch
    I've made a lot of progress, but I must be realistic: at the pace I've been progressing, it is unlikely that I will have a working solution within a year (and perhaps not within several years). It is simply not reasonable for me to ask my wife to wait a year, or two, or five, while I continue to struggle on a task that might simply be impossible for me to complete alone.
    Your a smart guy Mark and I've been impressed with your posts. It sounds like you have evaluated the current state of things and you are making and informed decision.

    It could be that a technical approach is wrong for you, but your affiliate marketing knowledge can still generate a decent income. Perhaps you should instead turn your plans around and work on substituting human labor from elance or similar service. Try to think of website generation and maintenance as a franchise type problem and work on documenting the steps required. Then you would have a management blue print for creating and maintaining websites via outsourced labor.

    That being said, I certainly believe that these types of endeavors are best done as side projects until they have proven themselves.

    Best of luck,
    Jim
    Merchants, any data you provide to Google Shopping should also be in your affiliate network datafeed. More data means more sales!

  14. #14
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by markwelch
    But again, the effort is nearly the same to build an infrastructure to import "everything," as to implement a "partial solution." (In any programming project, the effort required to create a "single use solution" is usually about 90% of the effort required to make a "dual-use solution," which in turn is about 90% of the effort required for a general multi-use solution.)
    I disagree. It's not nearly the same because you need a much stronger hardware infrastructure to handle a larger scope. If you really are using 800 datafeeds on a VPS then no wonder you have problems. You've got to scale within your hardware constraints. You can build a system with all the functionality you'd like, but you've got to test it with a small data set and scale as you go along.

    When I first started working with datafeeds my system was very inefficient and could only handle so many products before it started having scaling issues. But over the years I've improved my code and hardware to handle more and more products. I add only so many at a time and test for several months to see how they fare in my current environment. If I had started with the volume of products I have today I'd never have got it off the ground. It's a very natural progression that requires small steps and in this case only a few datafeeds at a time.

    What I'm really trying to say is don't build a huge database of everything and segment it out to different sites. The hardware requirements for even 1 million products is high (at least a dual core dedicated server with 4-6gb of ram). But if you build a handful of sites with different databases, maybe 100,000 products each then you'll have a much easier time. Combining a lot of data into a single place is never a good idea unless you're an expert at database optimization. Even the pros divide up their data into multiple tables so no single table has more than a million or two rows.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  15. #15
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Quote Originally Posted by Snib
    The hardware requirements for even 1 million products is high (at least a dual core dedicated server with 4-6gb of ram).
    - Scott
    Yup, totally agree. Another point... At 30 million products, I was using a dual quad core with 8GB of ram and 15k RPM SCSI and that was JUST for datafeed processing. I've since moved way beyond those hardware specs. Datafeed processing is a power hungry endeavor.
    Merchants, any data you provide to Google Shopping should also be in your affiliate network datafeed. More data means more sales!

  16. #16
    ABW Ambassador
    Join Date
    January 18th, 2005
    Location
    England
    Posts
    4,327
    30 million products? Holy crap

  17. #17
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    My problems with this project have not been due to hardware or server performance, nor due to database performance. I wish I could have progressed far enough where those issues arose.

  18. #18
    .
    Join Date
    January 18th, 2005
    Posts
    2,973
    Julian wrote: > "30 million products? Holy crap." <

    Holy crap, indeed:

    (1) That is a big number, but it still represents only a subset of available datafeeds from affiliate-program merchants. PopShops currently reports that its database contains "49,607,500 products from 1,757 merchant datafeeds," from just 7 networks -- and that excludes many hundreds of available datafeeds from merchants in those networks. (My most recent test database contains 2.3 million products from about 800 merchants in just 2 networks, but excludes several of the largest datafeeds from those networks.)

    (2) There really is an awful lot of crap in that data, requiring an immense amount of planing and design to filter and correct the data (and of course, some immense CPU and disk-access requirements while doing so). If the goal is to create a useful price-comparison engine, there is an immense amount of additional work (and CPU usage) to normalize data and match up products.

  19. #19
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Quote Originally Posted by markwelch
    Julian wrote: > "30 million products? Holy crap." <

    Holy crap, indeed:

    (1) That is a big number, but it still represents only a subset of available datafeeds from affiliate-program merchants. PopShops currently reports that its database contains "49,607,500 products from 1,757 merchant datafeeds," from just 7 networks -- and that excludes many hundreds of available datafeeds from merchants in those networks.
    That's kindof interesting. I would have thought that they would have had more. BTW, I'm well beyond 49 million now, but there are certain categories that severely increase the number products. So much so that I've considered filtering them out. Sports fan shops for instance. Imagine a shirt/jersey/hat/pen for every professional sports team in the world...

    Did you know there are 172k books published each year in the US alone? http://en.wikipedia.org/wiki/Books_p...untry_per_year
    Merchants, any data you provide to Google Shopping should also be in your affiliate network datafeed. More data means more sales!

  20. #20
    Newbie
    Join Date
    September 16th, 2009
    Location
    Agoura Hills, CA
    Posts
    17
    I agree
    Some categories are just not worth the while, especially when making it searchable.There are a lot of tuning to do, depending how you want to return the search result.

    We process close to a 100 million feeds almost in real time at the moment. Buy.com currently takes 20 minutes to be updated, price compared, categorized, indexed, replicated, and searchable on live servers. However we have been testing different database setups and are now convinced that we can get the buy.com feed done under a minute. Somewhere next year the switch will have been made.

    When we started 3,5 years ago updates took a week in batch jobs with 15 million products.

  21. #21
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Quote Originally Posted by adamson
    We process close to a 100 million feeds almost in real time at the moment.
    Real-Time is pretty cool. What type of hardware are you using? I'm not sure how many items I'm processing now, but it is about 680 merchants that results in 38 million product clusters where a product cluster could have anywhere between 1 and 30 merchant prices.

    It takes my code 12 hours to run through all 680 feeds. I'm using dual quad core Nehalem 5570's, 36GB of RAM and a raid 0 array of Intel 64GB SSD drives.

    Real-Time hasn't been a priority for me yet, but I can certainly see it being important in the next few years as my business model evolves.
    Merchants, any data you provide to Google Shopping should also be in your affiliate network datafeed. More data means more sales!

  22. #22
    Member BeepBeep's Avatar
    Join Date
    January 18th, 2005
    Location
    NJ Exit 2
    Posts
    61
    Try this:

    1. Copyright the work so far via Copyright.gov. Might cost ~$50. Not sure exactly how much.

    2. Assign the copyright to an "organization". Register the .Org. Assign the rights.

    3. Open the site + forum. Make the script(s) available via GNU license.

    4. See IF the world shows up and starts a project that just might be worthwhile.

    Just a thought. Easy for me to suggest it, but given the time/money/effort you have invested it just might be worthy of "putting it out there". There just might be some uptake on producing something that "a community" might benefit some.

    Downside? Well, when everyone has access to a workable multi-vendor feed then how will any site compete?

    Hmmm . . . Well . . . Doh! They will compete based upon a value proposition OTHER THAN the grunt work of cranking out feeds, and since when - other than "I have one and you don't!" - have feeds been a UVP, one worthy of getting your site to rank, getting your site to stand out, etc.?

    And, for all of this generosity, maybe . . . just maybe . . . you might gain a bit . . .

    Anyone know anyone whose efforts in the open source community lead to some degree of "other commercial success"?

    Man . . . it is so easy to delegate dreams and work to someone else, isn't it? :P

    OTOH, maybe sometimes . . .

    Good luck with your project, GNU or otherwise.
    I'll get the hang of it eventually. :0)

  23. #23
    Newbie abarba's Avatar
    Join Date
    August 11th, 2008
    Location
    Berlin
    Posts
    6
    @Snib -
    "hardware requirements for even 1 million products is high (at least a dual core dedicated server with 4-6gb of ram)"
    @isellstuff -
    "It takes my code 12 hours to run through all 680 feeds. I'm using dual quad core Nehalem 5570's, 36GB of RAM and a raid 0 array of Intel 64GB SSD drives.”
    I’m looking for a server and both of your insights were VERY helpful in making my decision. I am going with an flexible/expandable VPS now and will work with fewer than 50 merchants to test our system. As we grow I will be very excited to pay for a dedicated monster like yours.

  24. #24
    Newbie spiritmonk's Avatar
    Join Date
    December 2nd, 2009
    Location
    Bangalored unfortunately
    Posts
    26
    vioet!
    when u guys speak of 30-100-400millions of products to be processed i would definitely say that you fall under enterprise archiectural annts.i would rather say forthose cases forget php and switch to j2ee which can reduce your hardware requirements too.

  25. #25
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Quote Originally Posted by spiritmonk
    vioet!
    when u guys speak of 30-100-400millions of products to be processed i would definitely say that you fall under enterprise archiectural annts.i would rather say forthose cases forget php and switch to j2ee which can reduce your hardware requirements too.
    I'm using c# without a relational database.
    Merchants, any data you provide to Google Shopping should also be in your affiliate network datafeed. More data means more sales!

+ Reply to Thread
Page 1 of 2 12 LastLast

Similar Threads

  1. Quitting a merchant
    By waybar in forum ShareASale - SAS
    Replies: 12
    Last Post: January 19th, 2009, 10:59 AM
  2. Replies: 21
    Last Post: May 13th, 2008, 03:28 PM
  3. Feel Like Quitting
    By Cheesehead in forum Virtual Family and Off-Topic
    Replies: 17
    Last Post: March 18th, 2008, 02:55 AM
  4. Quitting a program
    By BrattyKitty in forum Midnight Cafe'
    Replies: 2
    Last Post: November 5th, 2004, 11:36 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •