Results 1 to 24 of 24
  1. #1
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    Question superCool’s Super Database
    superCool would like to create a super database with all the feed products for all of the programs he’s promoting. This could grow to more than a million products. superCool would like to load up all the products and then extract certain items for various websites which each have their own smaller dbs. superCool will also run some analysis and matching scripts against the big database.

    Would this be too much to put up on a HostGator shared host and would it affect the performance of other sites and dbs? Is it feasible to have a db this size on superCool’s run of the mill (but decent) home PC?

    Any suggestions in this area? Would it be better to build several smaller databases? Does anyone have experience with this type of super database?

    Thanks for any input!

  2. #2
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Guess it would depend on how information is displayed on your site for starters.

    If you are displaying 1MM products when the bots come, it could wreak havoc with your shared hosting. If the database is used for pulling products when someone uses a search function then you should be OK. One of our dbs is 3G and we have no problems, granted it's on one of our own servers.

    You can overload the MySQL server on the server depending on the amount of queries running at one time. Populating/updating the db could be a concern on a shared account.

    I'll refer to the others for more advice, just wanted to share my thoughts...
    Salty kisses, Sandy toes, and a Pirate's heart...

  3. #3
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Definitely too much for a shared account... it will take 20+ hours to update the database, and there will be too much load if multiple domains query the one database.

    To be honest, it's probably even too much for the standard VPS account (which are not as robust as hosts would have you think!).

    It sounds like you're a candidate for dedicated hosting.. one server all your own, and you can add as many accounts as you like (ie. domains). The extra monthly cost could be partially offset by those multiple shared accounts.

    For my own sites, I moved from Shared to VPS to Dedicated over the course of a couple of years. The important thing is to choose a host with proper support, so you can pick up the phone at any time.

    At first I was intimidated by the concept - I thought I would be responsible for hosting/system upgrades. But that's not the case at all - it's still under the host's care, you just have the ability to do whatever you want.

    If the server crashes from a heavy load, there is no "penalty box" or repercussions. My shared host used to delete scripts, and my VPS used to leave the server offline..

    I'm not going to get into hosting suggestions here, as that's not your question (!), but PM me if you want to know who I use.

  4. Thanks From:

  5. #4
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    The sites won't be running off the huge db - it's just a repository for all the datafeed products. superCool will run scripts against the big db to extract/export products which will then be imported into the smaller db for each site. So the site performance will be normal, but the super db will apparently be crap. The goal is to easily pull very niche specific products from all merchants and send them to the appropriate site db.

    It does seem like this db will be too large. It would be so cool to have everything in one place where you could run queries and whatnot against every item. Maybe superCool should spread things out over several dbs and run his scripts against them all.

    Don't think dedicated hosting is in the cards for now. This big db is not necessarily required for superCool, but is a nice-to-have that could make things easier and better.

    Thanks for the input. More is welcome.

  6. #5
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Ahh, ok I see what you mean now..

    You can actually create and maintain the database on a local PC - I'm not too familiar with running dual environments, but my desktop has CygWin, which replicates Linux.

    If MySQL (and the monster database) was installed on your local PC, I see no reason you couldn't run a series of "mysqldump" commands to generate smaller files, which you can then FTP to your host. Then, on each site, run the corresponding "mysql" import command.
    Last edited by teezone; April 5th, 2012 at 02:53 PM. Reason: spelling

  7. #6
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Ahh, here is a blast from the past:

    http://www.abestweb.com/forums/dataf...dem-74918.html

    Now, let me tell you what happened, hindsight being 20-20 and all that. The first problem was getting the datafeeds to my home computer. I wanted to script it and I had to use my front end server as a proxy to grab files. This might still be a problem with GAN? They ftp feeds to my server, is there another way? When you don't have a static IP, is bit of a pain. Luckily the IP is settable in the UI now-a-days.

    I ran this feed snarfing/import script on my home computer for a couple of years. I just wasn't serious about the freshness of my data back then. Freshness means a lot IMHO. So when items went out of stock, I had dead links on my websites. And I didn't care. Big mistake as it really, really hurt conversions.

    One thing I learned.... Make sure your home computer has RAID mirroring and a backup power supply that can keep your computer on for awhile. Also make sure your backup power supply gracefully shuts down your computer when the power is running out. Those corrupt MYSQL databases from power failures or disk failures really set you back...

    When I got serious about freshness, I realized I needed to automate everything and update my web servers regularly. The upload speed of my home computer then became a big issue. I was uploading maybe 1GB at the time. It took way too long.


    Here is an idea. Perhaps using webservices from LinkShare/CJ, you could search the datafeeds in real-time or next to real-time instead? It will keep your sites fresher.
    Merchants, any data you provide to Google Shopping should also be in your affiliate network datafeed. More data means more sales!

  8. #7
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    thanks for the tips everyone. isellstuff, GREAT to see you popping your head back in here. Please stay!

    superCool should get more into the webservices thing. He has some scripts that work but has never taken full advantage of them. Some merchants also provide incremental feeds and/or change dates on the rows, which could also help if used properly. It just seemed so easy to throw everything into one huge super db and have at it, but it's probably not feasible for superCool. Currently switching things over to my new desktop so haven't tried a large db yet. The old pc couldn't handle it.

    Thanks again friends

  9. #8
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Webservices aside (which I know very little about!), I have some of my site backups sent to my desktop PC..

    I don't think downloading datafeeds from the networks via FTP would be an issue (twice a week would be more than enough for freshness). And when you have a finished new database, you can gzip the "mysqldump" output (~20% compression ratio). For example, a 500MB database could be compressed to 100MB, which is quite easy to upload. Once it has been uploaded, you perform the corresponding unzip, then import into MySQL.

    Agree that backup of the desktop is VERY important.. and thanks for the reminder isellstuff!!

  10. #9
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Yes, granted my experience is based on doing something at a scale that few affiliates undertake, for good reason. Snib called it though when he said I was introducing a bottleneck.

    I've seen a lot of evergreen affiliate websites that should do quite well, but are missing many opportunities because their auto-population of products is poorly implemented or out-of-date. It is the kindof thing that gives affiliates a bad name in the eyes of Google.

    If you are a programmer, try doing webservices. I have a couple of evergreen type of sites that I play around with for SEO purposes. I basically compute the content for the website once and write it to disk, then periodically I update that content on a separate thread (because the compute time takes awhile). I'm not just throwing up raw data, I'm filtering, sorting, normalizing in an attempt to provide a feeling of "freshness" and "quality" to my evergreen site.

    They said that one of the criteria of Panda was "Would you trust this website" upon visual inspection. I think about that a lot when I design some of my side project websites. I want to create something that takes no effort to maintain but still is useful and trustworthy to a random visitor.
    Merchants, any data you provide to Google Shopping should also be in your affiliate network datafeed. More data means more sales!

  11. #10
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    That sounds a lot like what superCool is doing ... quite a bit of filtering, renaming, matching, categorizing, optimizing, etc., but up to this point it's a manual process to run the jobs on the home computer and then upload the files and load them into the db. Each website has it's own set of config files and it's own processing scripts and it takes a while to get everything set up. superCool is trying to get things simplified and more flexible, hoping to be able to leverage more work across all sites and categories. It's something superCool is very interested in but time is so limited . Thanks for your insight everyone. It helps

    isellstuff - is your system written in php or do you use something like java for the tough crunching?

  12. #11
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Quote Originally Posted by superCool View Post
    isellstuff - is your system written in php or do you use something like java for the tough crunching?
    Mine is in c#... I spent my first 10 or so years as a computer engineer porting 3D graphics applications to Windows from various Unix flavors. I'm afraid I am firmly entrenched in the Windows camp, even though I KNOW Linux is better. I like all the open source software for Linux and actually use the .net ports of a lot of it.

    I'm on dedicated hosting, so I might have a few more options, e.g. cron jobs and what not. Can you write to disk on shared hosting?

    I've also go monster servers, BTW... For very heavy lifting, I have a dedicated feed crunching machine. In the thread I posted, you will notice that snib says he (did) his processing at night. That might be the way to do it, 3AM and all.

    Just a random idea, maybe of no merit.... I have to "warm up" my main website by initializing a lot of static structures. It takes about 5 minutes. I am using a website monitoring service like "pingdom.com" to hit a webpage periodically to ensure that my global, in-memory caches are current. Anyway, the point is that I am periodically hitting a specific page to keep things fresh, current and "in-memory".

    Its important to note that this webservice munging on a shared hosting service or processing on your home computer is a stop-gap measure to save money and at some point, you should probably make the leap to a more dedicated environment. I've always overpaid for my server hardware, but I look at it as a necessary expense to ensure my biz keeps growing. I made the jump to dedicated hosting while I still had my day job. Back then I was rolling all of my profits back into the biz to support growth. By the time I quite my day job, I actually had three dedicated servers.
    Merchants, any data you provide to Google Shopping should also be in your affiliate network datafeed. More data means more sales!

  13. Thanks From:

  14. #12
    Member
    Join Date
    May 20th, 2007
    Posts
    55
    We do something similar for about 20MM products, we process the feeds daily across multiple servers using Scala which feeds into a few Solr instances.

    The processing of the feeds is the hardest part, if you get 1MM products a day then the speed at which you process them will be determined by your server(s). If you're only looking to update things once a week then you've got a lot more time to process the data, however if you want daily updates then you'll have to go for a gruntier server.

    One idea is to pull the files into Amazon S3, fire up some EC2 instances to process your data quickly then import that into your db. You can then shut the EC2 instances down right away, that way you're only paying for the CPU time you actually use.

    Hope this helps.
    Merchants, build your own coupon landing pages with [URL="http://couponzor.com/"]Couponzor[/URL] - Example Landing Page: [URL="http://istockphoto.couponzor.com/"]iStockPhoto Coupons[/URL]

  15. #13
    ABW Ambassador kse's Avatar
    Join Date
    November 29th, 2005
    Posts
    2,511
    Quote Originally Posted by superCool View Post
    superCool would like to create a super database with all the feed products for all of the programs he’s promoting. This could grow to more than a million products. superCool would like to load up all the products and then extract certain items for various websites which each have their own smaller dbs. superCool will also run some analysis and matching scripts against the big database.

    So did you ever set this up?? I thinking about setting up one database with less then 50K products and I will use this database on severial other sites that are hosted on other servers/host. I tired of having so many small databases to take care of.

    Note: I will not be using the same products on all the different sites most products will only be shown on one site (maybe two at the most)
    MERCHANTS: Start showing your coupons directly on your site, that way your shoppers will stop leaving your site looking for them!! If not then remove your Coupon Box!!

  16. #14
    Member Prosperent's Avatar
    Join Date
    November 29th, 2009
    Location
    CO, USA
    Posts
    820
    If you are serving products from the database (mysql is pretty slow), but you can probably get away with it IF all of the data fits into memory, or you have fast disk i/o (solid state drive).

    If you are just storing the data and exporting chunks of it to the sites as feeds, you can get away with less hardware wise.

    Anyhow, happy to help with any questions I can. This is my area of expertise. We process over 50 million products daily in our 24 server cluster and index them in our custom search engine back end. We can do it quickly thanks to fast ssd's and lots of ram.

  17. Thanks From:
    kse

  18. #15
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    All depends on your hosting setup(s), traffic, and number of queries at one time -

    50K products in a single mySQL db is painless. Question is "how many datafeeds/products are you processing to get that 50k?"

    Also a possibility that you may run into permission issues if you are on multiple shared hosting accounts - you should contact each hosting provider.

    As Brian mentioned, an SSD server and tons of RAM is the cat's meow - but for most affiliates the SSD drive is a luxury and a tad overkill. You may want to consider a single dedicated server instead of multiple hosting accounts across different providers...
    Salty kisses, Sandy toes, and a Pirate's heart...

  19. Thanks From:
    kse

  20. #16
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Quote Originally Posted by Prosperent View Post
    We process over 50 million products daily in our 24 server cluster and index them in our custom search engine back end. We can do it quickly thanks to fast ssd's and lots of ram.
    Are these numbers right? I'm processing about 100 million items twice a day on a single, beefy server. By beefy I mean a striped array of five SSD drives, 143 GB of RAM, 12 CPU cores. It takes about 10 hours to finish a run though.

    How quickly do you zip through the complete 50 million items?

  21. #17
    Member Prosperent's Avatar
    Join Date
    November 29th, 2009
    Location
    CO, USA
    Posts
    820
    Quote Originally Posted by isellstuff View Post
    Are these numbers right? I'm processing about 100 million items twice a day on a single, beefy server. By beefy I mean a striped array of five SSD drives, 143 GB of RAM, 12 CPU cores. It takes about 10 hours to finish a run though.

    How quickly do you zip through the complete 50 million items?
    I should note that the cluster is doing more than processing. At the same time the cluster is also handling around 2000 concurrent requests for data from our API, Ads, and other tools. So, to be fair, we have a single server that houses the feeds which are then cleaned and fed into the db layer for storage, then finally indexed on the front end app servers which actually serve the data. The db layer is similar hardware wise to yours. Raid 10 arrays of micron p300 ssd's, 128gb of ram, dual hex core processors replicated to multiple machines. The front end machines that house the final search indexes are all dual hex core, intel ssd's, and 24gb of ram. The dataset is sharded and distributed across 16 of those servers which handle the bulk of our requests. We then have a memcache layer with a few hundred gb's of ram for cache, and all of it sitting behind some haproxy load balancers. Oh, and our image layer which processes and resizes all of the images from the merchants into half a dozen different sizes. These then get fed into a CDN and cached again by cloudflare in front of the cdn.

    I would imagine our processing time would be similar to yours if we could run it through the cluster without all the traffic (similar hardware).

  22. Thanks From:

  23. #18
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Apologies, cloud computing is a concept I'm continually arguing about with my friends who continually ask me, why don't you use hadoop, s3, etc. I see you have built your own cloud hardware, which is my first argument against say, using Amazon's cloud services. Default hardware configs aren't beefy enough to do the job we need done, particularly because we need to process data very quickly.

    Yeah, I'm not rebuilding my search index in 10 hours, I'm just updating prices. A search index rebuild takes much longer.

  24. #19
    Member Prosperent's Avatar
    Join Date
    November 29th, 2009
    Location
    CO, USA
    Posts
    820
    The biggest problem with the "cloud" is i/o. It's terrible on all of the systems I have seen. Well, that and cost. The calculations I have done for S3 would put our costs at almost 10X what we spend for dedicated hardware, and we could cut costs even further if we decided to colo at some point.

  25. #20
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Quote Originally Posted by Prosperent View Post
    The biggest problem with the "cloud" is i/o. It's terrible on all of the systems I have seen. Well, that and cost. The calculations I have done for S3 would put our costs at almost 10X what we spend for dedicated hardware, and we could cut costs even further if we decided to colo at some point.
    This mirrors my calculations. I've looked at colo too, but the savings wouldn't be worth the freedom I would give up. I like having someone on call 24x7 to deal with issues while I am on vacation.

  26. #21
    Member Prosperent's Avatar
    Join Date
    November 29th, 2009
    Location
    CO, USA
    Posts
    820
    We're in the same boat. It's also nice not having to replace a $2,000 solid state drive yourself when it goes bad lol. Make a call, open a ticket and it gets handled no charge.

  27. #22
    OPM and Moderator Chuck Hamrick's Avatar
    Join Date
    April 5th, 2005
    Location
    Park City Utah
    Posts
    16,646
    All I can say is Damn! You guys is hardcore....

  28. #23
    ABW Ambassador writerguy's Avatar
    Join Date
    January 17th, 2005
    Location
    Springfield, Missouri, USA
    Posts
    3,248
    Quote Originally Posted by Chuck Hamrick View Post
    All I can say is Damn! You guys is hardcore....
    Yeah, they're all WAAAAYYYY above my pay grade for sure!

    Gary
    Generate more fake news.

  29. #24
    ABW Ambassador superCool's Avatar
    Join Date
    April 23rd, 2008
    Location
    Texas
    Posts
    1,268
    Wink
    Quote Originally Posted by kse View Post
    So did you ever set this up?? I thinking about setting up one database with less then 50K products and I will use this database on severial other sites that are hosted on other servers/host. I tired of having so many small databases to take care of.

    Note: I will not be using the same products on all the different sites most products will only be shown on one site (maybe two at the most)
    superCool got everything working pretty well and is using some of it, but never got all feeds loaded due to the size and processing issues... decided it was not feasible at this time. superCool is small-time and also has no time , so it hasn't gone far. Still moving in that direction but very slowly. 50k products should not be a problem at all even for a crappy home PC. biggest one superCool has is around 200k and it moves along just fine.

    good luck

    (it's very interesting to hear you pros talking about this ... please carry on )

  30. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. It's Supercool Day
    By Trust in forum Virtual Family and Off-Topic
    Replies: 50
    Last Post: March 5th, 2010, 09:22 AM
  2. I miss Supercool
    By guinness618 in forum Virtual Family and Off-Topic
    Replies: 12
    Last Post: December 11th, 2008, 10:45 PM
  3. superCool goes fishing
    By superCool in forum Virtual Family and Off-Topic
    Replies: 17
    Last Post: July 8th, 2008, 10:06 AM
  4. superCool
    By Kevin in forum Virtual Family and Off-Topic
    Replies: 25
    Last Post: May 28th, 2008, 12:22 PM
  5. Hello People - superCool here
    By superCool in forum Introduce Yourself
    Replies: 21
    Last Post: April 25th, 2008, 03:45 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •