Results 1 to 10 of 10
March 5th, 2006, 02:05 PM #1Need help w/Corrupt Data
I have a database that is tab delimited that I am implementing into a site with a script using php and mysql.
The problem I'm having with this particular file is the are lines that are corrupt so the columns don't match and it throws off my categorys and such...
Is there a way of "cleaning" or detecting a corrupt line?
I was thinking of checking if a value in a column is there and it matches a pattern.
I'm using '\t' to define the tab...I thought maybe that was a problem but when I look at the database in CSV'ed it is not good there either.
I'm emailing the merchant to see if they are aware of this.Suz~~GearGirl~~
March 5th, 2006, 02:56 PM #2
- Join Date
- January 18th, 2005
I have seen this when transferring .zip files in ASCII format. Even if they are text within the ZIP file, move it in binary format.
March 5th, 2006, 03:19 PM #3
I'm accessing it directly through http:url and processing it on my sites host server.
I emailed the merchant and thay are pretty on it so I'm sure I'll get a reply tommorow....just sucks when you have the time to do something and a glitch comes up thats out of your control.
March 5th, 2006, 03:23 PM #4
What I've usually found is that tabs or linefeeds in the descriptions (or other fields) throw things off. The easiest way to check for that is to check the number of tabs in each line. If one has more or less than the usual number, that line is probably wrong. Of course you can check the format of individual fields, as well.
Some feeds are really bad about that.
March 5th, 2006, 04:00 PM #5
I can try counting the amount of tabs in each line and when the count doesn't match the correct amount I can toss that line....
Now to code it
Thanks again for the suggestion.Suz~~GearGirl~~
March 5th, 2006, 07:08 PM #6
I was thinking about how to deal with this as well. I think eventually I need to write a program that runs in windows that can verify, convert and import my datafeeds. I'll be able to give it maps of the different vendors and map their fields to my fields. This is better written for the client side than the server side.
March 10th, 2006, 03:09 AM #7
- Join Date
- January 18th, 2005
In nearly all such cases I noted the problem if I opened a CSV with Excel. There would be invisible tabs that would create numerous columns in some rows and not enough in others. My solution is to use Access to import the CSV and then export to Excel.
March 10th, 2006, 07:42 AM #8
I think this may be the same problem I am running into with the new program Peter is making for me. It all works but the output which is a copy of a line from a merchant provided feed does not always line up.
He is looking at it this weekend.
How can I spot or get rid of these "invisible tabs"?Comments are opinion unless otherwise noted. Remember, pillage first. Then burn. Half of all people in the world have IQs under 100. You best learn to trust ol' SSanf!
May 4th, 2006, 04:49 PM #9
two pence for other readers
- Join Date
- January 4th, 2006
This is common with merchants feeds, what you have to understand is the majority of data feeds are actually just dumps from the merchants website. Now when the merchant makes a data feed they would seperate each field with a tab, comma, semicolon, pipe etc and in turn each record is seperated by a new line (line break).
As the others have already said this breaks down when whoever entered the product information adds a comma as part of a description or product name (these are the two worst offendors). The reason it breaks the whole feed is that the parser or your program that reads it finds a new instance of a comma or whatever seperator is used and obviously thinks it is a new field.
We develop product feed data management software for networks and affiliates and the only reliable way we have of dealing with this is to use validators of the data within the field. Someone suggested counting the amount of tabs, this does not work as there is no reason why feeds cant have more tabs than the amount there should be as long as the merchant "escapes" that tab or seperator with something like / or " before it, this way we know to ignore that tab.
By validating the information inside a field you will find you have no problems with this in the future. By validating a field i mean that some fields you should reasonably expect to find certain types of information, for example a url field should reasonably start with http:// or www. at least. So if either of these are not present at the start of the information in this field then you know that either the information is missing (bin it) or somehow the data has got itself corrupt.
Anyone looking at developing scripts would be better off using validating than counting how many instances of the seperator in use. By doing that not only do you get a correct row of data but you are also checking the data within it is what you expect using your own defined rules.
Hope this is of some help to others
May 4th, 2006, 06:10 PM #10Originally Posted by pricethat~Rhia7 -- Remember the 7
By Nathan Weinberg in forum Google Affiliate Network - GANReplies: 12Last Post: December 8th, 2010, 12:00 PM
By affninja in forum Rakuten LinkShare - LSReplies: 3Last Post: February 22nd, 2009, 03:09 PM
By ecomcity in forum Midnight Cafe'Replies: 41Last Post: January 7th, 2005, 10:03 AM
By swimmer in forum Midnight Cafe'Replies: 0Last Post: May 20th, 2004, 07:21 PM