Results 1 to 19 of 19
  1. #1
    The affiliate formerly known as ojmoo
    Join Date
    January 18th, 2005
    Posts
    1,466
    Smile Datafeed adding tip
    This is for if you have a dedicated server and aren't paying for a server that is much faster than you really need.

    You know when my datafeed is stable, i.e. not adding/removing records/products then querying the site is always superfast. If don't right your select should take much less than a hundredth of a second (to be conservative). But what slows that down is when products are being added/removed.

    It seems like such an easy thing and it shouldn't take a long time. But there is much more going on and it really all comes down to indecies, let me explain. If you know what you are doing, then you should know that the key to speed on a database is the proper use of indecies and that is why the queries run so fast. And if you look for one group of products or 100, 1000, million whatever, they all pretty much run at the same speed, given enough memory.

    But when you add/delete products that is what your indecies are in flux. While they are in flux it slows down all queries b/c how are you going to find all the green apples when you are adding more apples while looking, so the query has to pause until the index is stable or close to it. Now what happens when you add a new datafeed, you have to remove the old products and add the new one. This doesn't mean delete and then add. You could modified a field saying ok, this product isn't in the database and update it again putting it back. This is faster than deleting and has the added benefit that you can see if the price had changed if you want that. So you issue the command making the block change of this field. If the datafeed is small it takes no time, but what if the datafeed is 'big' then this could take a while, even though it is faster than a delete. While it is happening all your queries back up a little while and your load goes up. But it has to be done and it gets back to normal eventually.

    But what if this is happening and you have a second datafeed that needs to be downloaded, now you have two slowing you down. Or lets say for another reason you load is up. What you should do, is check the system load before updating a new datafeed. If the load is too high wait, if not go for it. It never occurred to me to check the load before updating and my site had times of slow go (or even crash.) So there is a command out there that gets the load time of the server, if the load is high for whatever reason it is best to wait until the load has gone before updating your datafeeds.
    Expert who says Moo

    a.k.a. OJMOO

    Cow Dance


  2. #2
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    If the Google Bot is slamming your site, would your imports be paused indefinitely? I've had really high Google activity for weeks, the bot never leaves.

    Could you run the import on a different thread with a lower priority?

  3. #3
    The affiliate formerly known as ojmoo
    Join Date
    January 18th, 2005
    Posts
    1,466
    isellstuff, you are smarter than that. If google is hitting your site so hard that it is slowing all your queries than you have a different problem. Google's bot is well behaved and can be controlled through webmaster tools. If your server can handle the load (as it has to) then waiting for a time when it can also handle adding in feeds is fine. If it can't handle the load, then you might need a faster server.

    Either way, having your server slow down while google is querying it can't help your placement within google. There has to be time when the server can add the products without slowing the query time too much. My whole point is to check to make sure that the load isn't unreasonably high before adding your datafeed.
    Expert who says Moo

    a.k.a. OJMOO

    Cow Dance


  4. Thanks From:

  5. #4
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Sorry oranges, just being chatty. What I was fishing for was whether or not you have code in place to detect starvation or whether starvation of an import is possible. In other words, how will you know that it is time to upgrade your server? True story, sometimes my imports would fail for weeks before I noticed. I've got notification code in there now, but before that, wasn't good...

  6. #5
    The affiliate formerly known as ojmoo
    Join Date
    January 18th, 2005
    Posts
    1,466
    Well, regarding when its time to get a new server comes down to the speed of your queries. Of course, if you don't have things already 'optimized' then you might update prematurely if that is a problem for you.

    Regarding imports, it all depends on the reason for the failure. If you mean that it doesn't add the records it should, then you can generate an email or something. Coley told me one that anytime there is a 20% change in the quantity of records produced from a datafeed he sends himself an email.
    Expert who says Moo

    a.k.a. OJMOO

    Cow Dance


  7. #6
    The affiliate formerly known as ojmoo
    Join Date
    January 18th, 2005
    Posts
    1,466
    Speaking of optimization that is when the slowest queries happen. Never process a datafeed while you are optimizing.
    Expert who says Moo

    a.k.a. OJMOO

    Cow Dance


  8. #7
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    When I talk about starvation, I'm talking about this:

    Dining philosophers problem - Wikipedia, the free encyclopedia

  9. #8
    The affiliate formerly known as ojmoo
    Join Date
    January 18th, 2005
    Posts
    1,466
    Starvation isn't a problem is this type of scenario since (from the article) in this scenario someone's left fork can't be someone else's right fork and thus a deadlock could happen. Everyone's left fork is everyone left fork and everyone right fork is everyone's right fork and if you always pick up the left fork first and then the right fork you can't have everyone holding left forks and there isn't a right fork available.

    But a more common problem is this. When you have complex programs you might forget to close a file. Every file that is opened must be eventually closed. IF you don't, same in one program out of the 100s you have, then eventually, as the programs and run over and oven the open resource will run out and you can't open another thing. This will deadlock you. Because even if you have a million forks, if you don't return them after using them you will eventually run out and then you do starve. And finding the little fork hording program can take a while....I figured out my solution for this too :-)
    Expert who says Moo

    a.k.a. OJMOO

    Cow Dance


  10. #9
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    I wasn't referring to a deadlock, rather I was referring to this:
    Resource starvation might also occur independently of deadlock if a particular philosopher is unable to acquire both forks because of a timing problem.
    Here's the problem in code:

    while(true) {
    if(Computer Under Load) {
    Sleep(some amount of time);
    }else {
    DoSomeDatafeedWork();
    }
    }

  11. #10
    The affiliate formerly known as ojmoo
    Join Date
    January 18th, 2005
    Posts
    1,466
    I don't know how you code your site, but the programs that deal with the datafeed shouldn't be running continuously. IT should wake up, see if there is/are a datafeed(s) to process and then die.

    The program doesn't sleep, it ends. IF it can't process the datafeed at that moment it ends and then the cron starts a new one at a later time. Why should it waste resources sleeping when it could just be created, realize it can't do anything and die. Letting the next program deal with that datafeed.
    Expert who says Moo

    a.k.a. OJMOO

    Cow Dance


  12. #11
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    We do it, in what we consider, a simpler way.

    When our db's update, a new table is created with all the products (we import all the merchants during each update).

    When the import/update is completed - the original products table is dropped and at the same time the temp table is renamed to what was the original products table.

    Works for us...
    Salty kisses, Sandy toes, and a Pirate's heart...

  13. #12
    ABW Ambassador isellstuff's Avatar
    Join Date
    November 9th, 2005
    Location
    Virginia
    Posts
    1,659
    Quote Originally Posted by oranges View Post
    IF it can't process the datafeed at that moment it ends
    All I've been asking is how many times in a row do you allow it to end without performing work before it e-mails you a warning?

    We do it, in what we consider, a simpler way.

    When our db's update, a new table is created with all the products (we import all the merchants during each update).

    When the import/update is completed - the original products table is dropped and at the same time the temp table is renamed to what was the original products table.
    Yup, this is what I do as well.

  14. Thanks From:

  15. #13
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    How about running a single import process every minute or two that checks if the same process is already running. If it is then it dies. This process can check the last updated timestamp for every datafeed and process the feed that is in most need of an update. If the MD5 of the datafeed matches the previous MD5 then skip it. This process can run between a period of night-time hours so as not to interfere with daytime traffic. You can adjust the frequency based on the number of datafeeds you're dealing with.

    Have you optimized your my.cnf? Are you using innoDB tables? If you haven't touched this then you're sure to have some optimizations that will improve performance.
    Hatred stirs up strife, But love covers all transgressions.

  16. #14
    The affiliate formerly known as ojmoo
    Join Date
    January 18th, 2005
    Posts
    1,466
    Convergence: it may work but it isn't a good way of doing it. 1) It must take hours to redo your entire database, 2) you can't see how prices have changed since you are losing all the old data 3) You have to do it at one time so your data could be as much as 24 delays without reason.

    What I do is also have 2 sets of tables a primary and a backup and when the primary is corrupted, I just switch so my sites point to the backup. But for you when I got a new datafeed I would have the two and first update the one that isn't being used then readjust the pointer so that that is the primary and then update the second one. This will work faster and use less resources.

    I am in a special case. I like my hosting company they are inexpensive and responsive to my emails, but for some reason they decided eliminate the ability of my server to send outgoing mail, but I keep track of my server while I'm online so its not a problem. The datafeed not being processes for a while is not a big deal unless there is a systemic problem.

    I really had just implemented the don't add a feed if there is a 'large' server load recently. That is why I posted when I did, I thought hmmm this is a good idea others might gain from my insight. If I would have thought of it months ago I would have told you then
    Expert who says Moo

    a.k.a. OJMOO

    Cow Dance


  17. #15
    The affiliate formerly known as ojmoo
    Join Date
    January 18th, 2005
    Posts
    1,466
    SNIB great to hear from you my friend :-)

    What I do is that when a datafeed comes to my site, I decompress it and put its name in a file. Then every 5 minutes I check this file to see if any datafeeds need processing and if the load is right I do it and erase the feed from the queue. If the feed fails for some reason I just move on instead of doing it again b/c I don't want it to keep failing and not get to the next one. Usually I notice eventually and deal with it. Failures are almost always a temporary problem and the next feed fixes what ever caused the failure in the first place.

    I just implemented the load solution and its moving very smoothly. I did find that with big files there could be two datafeeds processing at the same time as one begins before the last one finished. I am thinking of instead of every 5 minutes doing one, waiting an hour and then doing all (or say 20-30 whatever) that had arrived in that hour assuming that the small ones will balance the large ones and the odds are it'll be finished b/c the next hours grouping will start.

    It's still a work in progress, but I have fixed some stuff and my site is more stable and my site is superfast. You can start interacting with the site in less than a second and the page finished downloading usually in less than 3 seconds. The benchmark performance softwares I find on the internet say my site is faster than 90+ of the sites out there....most of the time.
    Expert who says Moo

    a.k.a. OJMOO

    Cow Dance


  18. #16
    ...and a Pirate's heart. Convergence's Avatar
    Join Date
    June 24th, 2005
    Posts
    6,918
    Quote Originally Posted by oranges View Post
    Convergence: it may work but it isn't a good way of doing it. 1) It must take hours to redo your entire database, 2) you can't see how prices have changed since you are losing all the old data 3) You have to do it at one time so your data could be as much as 24 delays without reason.
    Speak for yourself - works PERFECTLY for us.

    1) We have DOZENS of databases - everything is not in one db.
    2) Could care less about seeing HOW prices have changed.
    3) Huh?
    Salty kisses, Sandy toes, and a Pirate's heart...

  19. #17
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by oranges View Post
    SNIB great to hear from you my friend :-)

    What I do is that when a datafeed comes to my site, I decompress it and put its name in a file. Then every 5 minutes I check this file to see if any datafeeds need processing and if the load is right I do it and erase the feed from the queue. If the feed fails for some reason I just move on instead of doing it again b/c I don't want it to keep failing and not get to the next one. Usually I notice eventually and deal with it. Failures are almost always a temporary problem and the next feed fixes what ever caused the failure in the first place.
    Sounds like you just need to check the process list to avoid concurrent processes. Then you can run it every minute or two so every feed is processed one after another. You just need to do something like this to get the processes that match your script:

    Code:
    exec("ps aux | grep import_script.php", $pslist);
    Then you just loop through $pslist looking for a match that doesn't have the word "grep" in it so as to avoid a false positive matching against your ps call.
    Hatred stirs up strife, But love covers all transgressions.

  20. #18
    Newbie
    Join Date
    March 6th, 2014
    Posts
    7
    Quote Originally Posted by oranges View Post
    I don't know how you code your site, but the programs that deal with the datafeed shouldn't be running continuously. IT should wake up, see if there is/are a datafeed(s) to process and then die.

    The program doesn't sleep, it ends. IF it can't process the datafeed at that moment it ends and then the cron starts a new one at a later time. Why should it waste resources sleeping when it could just be created, realize it can't do anything and die. Letting the next program deal with that datafeed.
    Hey I am very much interested in this program. Is there a bit of software out there which does this? At the moment I have Commision Junction sending me datafeeds (very new to this ) but I'm looking for something to extract and create a database/or table in the database, with these files.

    I found this which I've considered buying but I cannot get hold of the creator, bit of manual work required: Datafeed to MySQL Database Integration Software | Raakesh.com

    But your program of using a cron job sounds much more efficient and reliable. Any help in pointing me in the right direction would be great! and most appreciated.

  21. #19
    Moderator MichaelColey's Avatar
    Join Date
    January 18th, 2005
    Location
    Mansfield, TX
    Posts
    16,232
    I do very similar things. My datafeeds scripts check the system load after every 500 records and pause if the load is too high. I keep a copy of the previous feed, and if nothing has changed, the script just exits.

    I also check to make sure the previous datafeed script isn't still running. If it is, it doesn't run a second instance.

    On many, I also check the number of columns on each row to catch malformed feeds.

    On some of my feeds, it does a preprocess and if there is over a certain percent of changes or deletes, it emails me and I have to manually approve/run it. That keeps a bad or empty datafeed file from wiping that merchant from my database.
    Michael Coley
    Amazing-Bargains.com
     Affiliate Tips | Merchant Best Practices | Affiliate Friendly? | Couponing | CPA Networks? | ABW Tips | Activating Affiliates
    "Education is the most powerful weapon which you can use to change the world." Nelson Mandela

  22. Newsletter Signup

Closed Thread

Similar Threads

  1. Featured: TIP: Viewing large datafeed files
    By isellstuff in forum Programming / Datafeeds / Tools
    Replies: 11
    Last Post: July 6th, 2011, 09:52 AM
  2. Adding a signature???
    By AmyB in forum Introduce Yourself
    Replies: 5
    Last Post: March 14th, 2005, 11:22 AM
  3. Adding merchants to the datafeed...
    By Javi in forum Commission Junction - CJ
    Replies: 3
    Last Post: December 28th, 2003, 12:34 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •