Results 1 to 10 of 10
  1. #1
    Crazy like a fox suzigeek's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,096
    Need help w/Corrupt Data
    I have a database that is tab delimited that I am implementing into a site with a script using php and mysql.

    The problem I'm having with this particular file is the are lines that are corrupt so the columns don't match and it throws off my categorys and such...

    Is there a way of "cleaning" or detecting a corrupt line?

    I was thinking of checking if a value in a column is there and it matches a pattern.

    I'm using '\t' to define the tab...I thought maybe that was a problem but when I look at the database in CSV'ed it is not good there either.

    Any suggestions?

    I'm emailing the merchant to see if they are aware of this.
    Suz~~GearGirl~~

  2. #2
    Internet Cowboy
    Join Date
    January 18th, 2005
    Posts
    4,662
    I have seen this when transferring .zip files in ASCII format. Even if they are text within the ZIP file, move it in binary format.


  3. #3
    Crazy like a fox suzigeek's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,096
    I'm accessing it directly through http:url and processing it on my sites host server.

    I emailed the merchant and thay are pretty on it so I'm sure I'll get a reply tommorow....just sucks when you have the time to do something and a glitch comes up thats out of your control.

    thanks Scooter!
    Suz~~GearGirl~~

  4. #4
    Moderator MichaelColey's Avatar
    Join Date
    January 18th, 2005
    Location
    Mansfield, TX
    Posts
    16,232
    What I've usually found is that tabs or linefeeds in the descriptions (or other fields) throw things off. The easiest way to check for that is to check the number of tabs in each line. If one has more or less than the usual number, that line is probably wrong. Of course you can check the format of individual fields, as well.

    Some feeds are really bad about that.
    Michael Coley
    Amazing-Bargains.com
     Affiliate Tips | Merchant Best Practices | Affiliate Friendly? | Couponing | CPA Networks? | ABW Tips | Activating Affiliates
    "Education is the most powerful weapon which you can use to change the world." Nelson Mandela

  5. #5
    Crazy like a fox suzigeek's Avatar
    Join Date
    January 18th, 2005
    Posts
    1,096
    Thanks Micheal,

    I can try counting the amount of tabs in each line and when the count doesn't match the correct amount I can toss that line....

    Now to code it

    Thanks again for the suggestion.
    Suz~~GearGirl~~

  6. #6
    ABW Ambassador PatrickAllmond's Avatar
    Join Date
    September 20th, 2005
    Location
    OKC
    Posts
    1,219
    I was thinking about how to deal with this as well. I think eventually I need to write a program that runs in windows that can verify, convert and import my datafeeds. I'll be able to give it maps of the different vendors and map their fields to my fields. This is better written for the client side than the server side.
    ---
    This response was masterly crafted via the fingers of Patrick Allmond who believe you should StopDoingNothing starting today.
    ---
    Focus Consulting is where I roll | Follow @patrickallmond on Twitter
    Search Engine Marketing | Search Engine Optimization | Social Media | Online Video

  7. #7
    Member
    Join Date
    January 18th, 2005
    Location
    Australia
    Posts
    118
    In nearly all such cases I noted the problem if I opened a CSV with Excel. There would be invisible tabs that would create numerous columns in some rows and not enough in others. My solution is to use Access to import the CSV and then export to Excel.

  8. #8
    Super Sh!t Stirrer SSanf's Avatar
    Join Date
    January 18th, 2005
    Posts
    9,944
    I think this may be the same problem I am running into with the new program Peter is making for me. It all works but the output which is a copy of a line from a merchant provided feed does not always line up.

    He is looking at it this weekend.

    How can I spot or get rid of these "invisible tabs"?
    Comments are opinion unless otherwise noted. Remember, pillage first. Then burn. Half of all people in the world have IQs under 100. You best learn to trust ol' SSanf!

  9. #9
    Newbie
    Join Date
    January 4th, 2006
    Location
    Berlin
    Posts
    41
    two pence for other readers
    This is common with merchants feeds, what you have to understand is the majority of data feeds are actually just dumps from the merchants website. Now when the merchant makes a data feed they would seperate each field with a tab, comma, semicolon, pipe etc and in turn each record is seperated by a new line (line break).

    As the others have already said this breaks down when whoever entered the product information adds a comma as part of a description or product name (these are the two worst offendors). The reason it breaks the whole feed is that the parser or your program that reads it finds a new instance of a comma or whatever seperator is used and obviously thinks it is a new field.

    We develop product feed data management software for networks and affiliates and the only reliable way we have of dealing with this is to use validators of the data within the field. Someone suggested counting the amount of tabs, this does not work as there is no reason why feeds cant have more tabs than the amount there should be as long as the merchant "escapes" that tab or seperator with something like / or " before it, this way we know to ignore that tab.

    By validating the information inside a field you will find you have no problems with this in the future. By validating a field i mean that some fields you should reasonably expect to find certain types of information, for example a url field should reasonably start with http:// or www. at least. So if either of these are not present at the start of the information in this field then you know that either the information is missing (bin it) or somehow the data has got itself corrupt.

    Anyone looking at developing scripts would be better off using validating than counting how many instances of the seperator in use. By doing that not only do you get a correct row of data but you are also checking the data within it is what you expect using your own defined rules.

    Hope this is of some help to others

    Matt Yabsley

  10. #10
    What's the word? Rhia7's Avatar
    Join Date
    January 13th, 2006
    Posts
    9,578
    Quote Originally Posted by pricethat
    This is common with merchants feeds, what you have to understand is the majority of data feeds are actually just dumps from the merchants website...
    Yeah, and some networks charge for the dumps
    ~Rhia7 -- Remember the 7
    Twitter me

  11. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Corrupt Excel files in Windows 7
    By Nathan Weinberg in forum Google Affiliate Network - GAN
    Replies: 12
    Last Post: December 8th, 2010, 12:00 PM
  2. LinkShare Datafeed Files Corrupt?
    By affninja in forum Rakuten LinkShare - LS
    Replies: 3
    Last Post: February 22nd, 2009, 03:09 PM
  3. PayDay Loans looks to be a corrupt verticle to push
    By ecomcity in forum Midnight Cafe'
    Replies: 41
    Last Post: January 7th, 2005, 10:03 AM
  4. Reporting RICO (corrupt organizations) tips
    By swimmer in forum Midnight Cafe'
    Replies: 0
    Last Post: May 20th, 2004, 07:21 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •