View Poll Results: Manual or Automated Updates?
- 15. You may not vote on this poll
Results 1 to 3 of 3
July 14th, 2004, 01:07 AM #1
For those affiliates who use datafeeds, do you manually update your sites, or have you implemented routines to automatically reflect changes in the datafeeds? For those who don't automatically refresh the datafeeds, do you just avoid data that tends to change (like prices), or do you manually update that data, or do you just let it get out of date?
July 14th, 2004, 03:39 AM #2
- Join Date
- January 17th, 2005
The code I wrote has "datafeed specific" details such as the base directory for it and many other configurable items such as the ftp address, userid and password to download it, if it's zipped, what format it's in downloaded, the column information, where to upload it too, etc, etc.
Code generated get puts in a "generated directory" off the datafeeds base directory.
The upload facility reads files from the "generated directory" and then looks in the datafeeds backup directory to see if the file exists. If it doesn't, a connection is made (if needed), the file is ftp's to the appropriate directory on the server and when good - it makes a copy of the file in the backup directory (allows for restart and recovery without issues). Process next file....
If the file exists in both directories, both are opened and read into memory and a compare of the two files are done - if nothing changed.... next file - If it did change, (connect it needed, copy file to server, and make a copy in the backup dir).
With this processing only new or changed files are actually uploaded to the server which saves a ton of bandwidth and time in uploading.
I've thought about another change but havent implemented it yet. (delete the original file from the generated directory after it is copied to the backup dir). As time goes on you end up with many files being compared in the generated code dir that are simply old and will never be regenerated. - By doing the delete after the backup copy is executed, you not only save disk space, you further save time inspecting files on an ongoing basis.
I also have batch processing capabilities and the jobs run unattended by calling a script file to execute whatever commands from the job scheduler. A typical script file looks like:
My utility when running in unattended mode (which is a cmd line paramater) outputs results to a log file that I inspect every day for errors - make corrections as needed and re-run if necessary.
The download component downloads files, does any file conversions needed, executes data maniputation queries I've specified to be performed and then determines row counts against queries that have been defined (I use the row count information in the generation routines for adjusting the leftover columns on a page as well as detection of prior queries that are no longer needed and need to be removed).
New datafeeds are run thru a "quick start" facility with basic template file information being provided and what columns you want to do processing on (IE: summaries are on [Gender]_ [ProdType]_[SubCat1] going to details via [product_name]_sku.html (or whatever) and it then creates all the queries and needed template files for me as well.
A typical "outer summary template" that is used as a base for creating all summary templates by the quickstart looks something along the lines as below.
<TITLE>[Gender] - [ProdType] - [SubCat1] Page ^(extension_page_number)^ from mysite.com</TITLE>
<META NAME="Description" CONTENT="This page provide access to our selection of [Gender] - [ProdType] - [SubCat1].">
<META NAME="Keywords" CONTENT="^If_Field_Not_Eq_Output([Brand]^^[Brand],)^^MakeKeywords([SubCat1])^,^MakeKeywords([Gender])^,^MakeKeywords([ProdType])^">
<!--#include virtual="/header.html" -->
<b><FONT FACE="Arial" SIZE=2 COLOR="Blue">[Gender] - [ProdType] - [SubCat1] Page ^(extension_page_number)^</font></b></a>
<FONT FACE="arial" SIZE=2 COLOR=black>
<!--#include virtual="/as/^MakeUrlSafe([Gender]_[ProdType]_[SubCat1])^_inc_^(extension_page_number)^.html" -->
<FONT FACE="Arial" SIZE=2><b>
^If_Needed_Create_Extended_Links([^]^black^red^/as/^^, ^<font color=red face=arial>Other Pages for [Gender] : [ProdType] : [SubCat1]</font><br>^
<!--#include virtual="/search_box.html" -->
<!--#include virtual="/footer.html" -->
Clear as mud right - There is a host of manipulations and formating options available that can be put in the template files and executed during the generation process.
The reorginizing and optimization routines as part of the download is a key componnet so that the generation of a query doesnt have to scan the entire file to generate it's code. IE: In the above quick start, I specified the primary query data to be based on [Gender]_[ProdType]_[SubCat1] I rearrange the file as part of the download to that order. In this way I can sort the summary queries to be executed and if there are 300 different queries, I can execute all 300 in a single pass of the datafeed and on huge datafeeds you dont want to be reading the entire file 300 times! IE: when the [Gender]_[ProdType]_[SubCat1] changes, I know thats the end of those records and the next query will pick up where I'm currently positioned or below which is where it starts its reviewing of matching criteria from.
You can bypass te quickstart and optimization routines and enter queries online thru the facility if you'd rather do it that way.
For me, the hardest part is determining all the
data manipulation queries I want to execute. From the manipulation queries section of my utility, I can select the field to be manulated, the condition for it, =, !=, <, >, has or "doesnt have" and finaly the condition and manipulation script to be performed (yes, a single query can have multiple column conditions but they are all "and" conditions). So... If I created a manipulation query called "Fix_AthleticUndies" and it had 3 conditions:
Gender = Mens
ProdType = Underwear
SubCat1 = AthleticUndies
and specified that on rows that met these conditions were to call the script fix_AthleticUndies.txt to modify "whatever field(s) I wanted to modify they would be called after the download has been done and the datafeed optimized.
A typical manipulation query is nothing more than :
Another component of my utility is a Datafeed Analysis section that allows to me get the low down of a datafeeds specifics "quickly" and figure out how I want to access it.
I've also got a utility for reviewing of previously generated files sorted by date of last generation and I can select a single file or a range of them and it will connect to the server delete the files selected and delete the files from the local machine as well. When used on a file level (not a range), you can even invoke the browser going to google with the file name as a search to see if it is scanned or not and base the decision to delete based on the search results returned.
Manual processing sucks! Automated processing allows you to have "many more sites" use many different datafeeds and keep up with them easily.
There are lots of ways to improve productivity and "really good tools" help a bunch! I haven't done any new datafeeds with it in a while now and don't plan on using it in the future other than for maintenance of existing code so I figured I'd be generous and share some of my capabilities with you.
My project I've been working on for some time shaping up very nicely so you're welcome to enjoy some of my past work. Perhaps some of you can implement some code and benefit from the concepts presented.
July 14th, 2004, 07:41 AM #3
When I use datafeeds, I totally automate everything. Everything runs on a cron (Unix automated scheduler) and when updated datafeeds are found they are updated into a master database. The pages are built dynamically from the database.
Basically, the steps are:
<UL TYPE=SQUARE><LI>Get the datafeed file if it's changed. (I run this as frequently as every 30 minutes--Overkill, I know, but I like everything to be as accurate as possible. If nothing has changed, I'm done with virtually no processor or bandwidth usage.)
<LI>Flag all products from that merchant as "pending update" in my database.
<LI>Process the datafeed file, reformatting everything into a standard format. Flag each product in the feed as either "updated" or "added". Expand out the category hierarchy. Create URL-friendly ID's for each product. Etc.
<LI>After processing, any products left as "pending update" are "deletes". Depending on the quantity of them and the possibility of an incomplete datafile, I either delete them, flag them as deleted, or just leave them.
<LI>I then log some stats about the datafeed processsed: date/time, adds, deletes, updates, etc.
<LI>I then rank all of the products for that merchant based on a variety of special criteria that help me identify the most desirable products.
<LI>As visitors click on products, I log the clicks so I can determine the most popular products.
<LI>On a regular basis (usually hourly), I calculate the popularity rank and a net rank that takes into consideration both the popularity and the desirability of the product. These ranks help me determine which products to feature.
<LI>My web sites all feed off my database, which allows me to have simple, real-time access to a standardized version of the datafeeds.[/list]All this allows me to build a handler for each datafeed and from that point on I never have to touch it. I can dynamically access all the data in a standard format and know that it's as current as can be.
It might sound like a lot of work, but I can usually add a new feed in less than an hour and I never have to touch it again.
By alvin in forum ShareASale - SASReplies: 3Last Post: June 19th, 2006, 11:38 AM
By jrrl in forum ShareASale - SASReplies: 3Last Post: October 7th, 2004, 06:18 AM
By andyf13 in forum ClickBankReplies: 1Last Post: July 3rd, 2004, 12:49 AM