Results 1 to 23 of 23
  1. #1
    Believe knight01's Avatar
    Join Date
    August 14th, 2006
    Location
    Dayton, Ohio
    Posts
    1,815
    Bad characters in datafeed
    Some feeds have odd binary looking characters in the product name or description. i.e. ¿ or äó

    These are creating new lines in my mysql table and screwing up the import routine. In the past I manually removed them or if there were too many in the feed I simply didn't include the merchant in the site.

    I'm re-writing some of the import scripts and would like to make them bulletproof so I can truly automate the process.

    After tracking down on the merchant site it appears these start out as copyright or trademark code. Although in a few cases it appears to be a word, and shouldn't be causing any issues.

    Why and how do these characters get into a datafeed? Is there some way to strip these out or convert them to ascii so they are not breaking the import?
    Someday starts today
    Military Discounts

  2. #2
    ABW Ambassador writerguy's Avatar
    Join Date
    January 17th, 2005
    Location
    Springfield, Missouri, USA
    Posts
    3,248
    Boy, I really hope someone out there has an answer to knight01's question here, because I'm eagerly looking for the answer too.

    I have one merchant's feed in particular I'd like to use but it has perpetually had these characters in it -- and I have NEVER gotten anyone to respond with a way to deal with 'em.

    Anyone? Please??
    Generate more fake news.

  3. #3
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Ditto ....

    I had a friend of mine who specializes in this look at some of my datafeeds, and this was his response:

    "The files are UTF-8 encoded - as it should be - but still contains garbled characters. The original accented characters are likely in cp1252 encoding [or some similar western European encoding], but are being written to the feed as UTF-8. Somewhere along the line they've mixed up the encodings. I would contact them, pointing out this bug in the source. - with a screen shot of the issue in Notepad"

    So what we have are merchant feeds that contain mismatched character sets - this doesn't fix the problem, but at least gives you something more specific...

    Me, I strip them out before importing using a text utility, but it's a royal pain...

  4. #4
    ABW Ambassador writerguy's Avatar
    Join Date
    January 17th, 2005
    Location
    Springfield, Missouri, USA
    Posts
    3,248
    Quote Originally Posted by teezone
    Ditto ....

    I had a friend of mine who specializes in this look at some of my datafeeds, and this was his response:

    "The files are UTF-8 encoded - as it should be - but still contains garbled characters. The original accented characters are likely in cp1252 encoding [or some similar western European encoding], but are being written to the feed as UTF-8. Somewhere along the line they've mixed up the encodings. I would contact them, pointing out this bug in the source. - with a screen shot of the issue in Notepad"

    So what we have are merchant feeds that contain mismatched character sets - this doesn't fix the problem, but at least gives you something more specific...

    Me, I strip them out before importing using a text utility, but it's a royal pain...
    Yeah, that sounds about like the explanation I got about a year ago on this issue with one of the datafeeds I keep wanting to use.

    Bottom line: The merchants really don't understand, or the people working for the merchants on their df don't understand, or the merchants/people working on their datafeeds simply don't care about the issue. I suppose, looking at it from the merchant's standpoint, there are a ton of more important things they have to do than fix some obscure problem with their df.

    But if you don't care enough to get it right -- why bother offering a df?
    Generate more fake news.

  5. #5
    Moderator
    Join Date
    April 6th, 2006
    Posts
    2,689
    Yup, I hear ya... the datafeeds he examined for me were some of the biggest department stores online. Surprising they would be a little sloppy, but I think they consolidate their OWN data from various internal sources (which could cause the inconsistent encoding)

    .... and don't get me started on the inconsistencies between merchants!

    Unfortunately, yes, it has fallen to us to "fix" the feeds..

  6. #6
    Full Member
    Join Date
    March 10th, 2006
    Posts
    466
    Why and how do these characters get into a datafeed? Is there some way to strip these out or convert them to ascii so they are not breaking the import?
    I haven't actually done this myself, but off-the-bat I would be looking for a PHP function which returned the decimal value of an ascii character. If the returned value is greater than 255 or less than zero, then discard the character.

    Look up ascii tables online and you'll see each one is assigned a decimal or hexadecimal, or binary value. I prefer decimal.

    Sorry, don't have a real good answer for it.

  7. #7
    ABW Ambassador
    Join Date
    January 18th, 2005
    Posts
    551
    Contect the merchant. I did and they fixed it.

  8. #8
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Just have your PHP code strip the non-ASCII characters before you import:

    Code:
    $prod_name = preg_replace('/[^(\x20-\x7F)]*/','', $prod_name);
    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  9. #9
    Full Member
    Join Date
    March 10th, 2006
    Posts
    466
    Cool !!!

  10. #10
    Believe knight01's Avatar
    Join Date
    August 14th, 2006
    Location
    Dayton, Ohio
    Posts
    1,815
    Quote Originally Posted by Snib
    Just have your PHP code strip the non-ASCII characters before you import:

    Code:
    $prod_name = preg_replace('/[^(\x20-\x7F)]*/','', $prod_name);
    - Scott
    Count on Snib!

    Thanks Scott, that is exactly what I was looking to do. Not sure what
    '/[^(\x20-\x7F)]*/'
    means, but it does seem to get rid of the offending characters.
    Someday starts today
    Military Discounts

  11. #11
    ABW Ambassador writerguy's Avatar
    Join Date
    January 17th, 2005
    Location
    Springfield, Missouri, USA
    Posts
    3,248
    Okay, I'm working from a pretty high level of PHP/MySQL ignorance here, personally. LOL!

    Do I understand that this will remove the offending characters from a field/column in my table named prod_name? Or $prod_name? (See -- I told you my ignorance was pretty high-level.)

    The fields I need to remove the non-ASCII characters from are labeled Name and Description. How would I structure Snib's code to do that?
    Generate more fake news.

  12. #12
    Believe knight01's Avatar
    Join Date
    August 14th, 2006
    Location
    Dayton, Ohio
    Posts
    1,815
    Gary - open your script and find where the name variable is first used. In Snibs example he used prod_name, but it could be something like name or title.

    Right after that add the
    $variable_name = preg_replace('/[^(\x20-\x7F)]*/','', $variable_name);

    changing variable_name (2 spots) to whatever your script uses.

    Do the same for the description variable.

    If this is the same script you sent me a few months I ago I can peek in to see where these are.
    Someday starts today
    Military Discounts

  13. #13
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by knight01
    Count on Snib!

    Thanks Scott, that is exactly what I was looking to do. Not sure what means, but it does seem to get rid of the offending characters.
    No problem, those are just the hexadecimal values for ASCII codes 32 (space) to 127. It's a regular expression that says anything that isn't in that range should be replaced with a null value.

    Quote Originally Posted by writerguy
    Do I understand that this will remove the offending characters from a field/column in my table named prod_name? Or $prod_name? (See -- I told you my ignorance was pretty high-level.)

    The fields I need to remove the non-ASCII characters from are labeled Name and Description. How would I structure Snib's code to do that?
    This is to be used in conjunction with a script like the bulletproof datafeed example. It involves PHP code that reads in the datafeed one line at a time and inserts or updates each product. This example assumes you've named your product name variable $prod_name.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  14. #14
    ABW Ambassador Doug247's Avatar
    Join Date
    January 18th, 2005
    Location
    DE USA
    Posts
    931
    Quote Originally Posted by Snib
    Just have your PHP code strip the non-ASCII characters before you import:

    Code:
    $prod_name = preg_replace('/[^(\x20-\x7F)]*/','', $prod_name);
    - Scott

    Scott,

    Can at the file level after it is transferred to your server?

    Thanks,
    Doug
    Thanks,
    Doug

  15. #15
    ABW Ambassador writerguy's Avatar
    Join Date
    January 17th, 2005
    Location
    Springfield, Missouri, USA
    Posts
    3,248
    Quote Originally Posted by knight01
    Gary - open your script and find where the name variable is first used. In Snibs example he used prod_name, but it could be something like name or title.

    Right after that add the
    $variable_name = preg_replace('/[^(\x20-\x7F)]*/','', $variable_name);

    changing variable_name (2 spots) to whatever your script uses.

    Do the same for the description variable.

    If this is the same script you sent me a few months I ago I can peek in to see where these are.
    I think it probably is the same script I sent you a few months ago.

    Thanks for your instructions. I think I can get it -- though I might not have time for a couple of days. (Right now my time's pretty tied up with figuring out how to make a couple of ebay scripts work after their change to the new network setup. LOL!)
    Generate more fake news.

  16. #16
    ABW Ambassador Snib's Avatar
    Join Date
    January 18th, 2005
    Location
    Virginia
    Posts
    5,303
    Quote Originally Posted by itsupportnotes
    Scott,

    Can at the file level after it is transferred to your server?

    Thanks,
    Doug
    You can do one line at a time if you want to do the whole file, but it still needs to be read in through PHP.

    - Scott
    Hatred stirs up strife, But love covers all transgressions.

  17. #17
    ABW Ambassador writerguy's Avatar
    Join Date
    January 17th, 2005
    Location
    Springfield, Missouri, USA
    Posts
    3,248
    FYI -- I tried Snib's code line in the datafeed which was my problem. The good news is -- it worked as it should have.

    The bad news is -- this datafeed had duplicated "real" characters scattered around the non-ASCII characters. It's the International Bible Society (IBS) datafeed at SAS, and I and others have been after them literally for YEARS (at least a couple of years) to clean up the feed. Argh!!!!!!!!!!
    Generate more fake news.

  18. #18
    ABW Ambassador bettylou's Avatar
    Join Date
    December 27th, 2005
    Location
    Indiana
    Posts
    595
    Hi Gary,

    That's the feed that tried to help you with a while ago. I just took another look at it and it's still the same. I tried to run it through a parser program that I had bought for a really tough XML feed. It couldn't even sort it out.

    Didn't someone post something here about them getting a new feed soon? Maybe I am mistaken on that though.

    I don't work with them personally but I have heard that they are a great merchant except for that feed.

  19. #19
    ABW Ambassador writerguy's Avatar
    Join Date
    January 17th, 2005
    Location
    Springfield, Missouri, USA
    Posts
    3,248
    Quote Originally Posted by bettylou
    Hi Gary,

    That's the feed that tried to help you with a while ago. I just took another look at it and it's still the same. I tried to run it through a parser program that I had bought for a really tough XML feed. It couldn't even sort it out.

    Didn't someone post something here about them getting a new feed soon? Maybe I am mistaken on that though.

    I don't work with them personally but I have heard that they are a great merchant except for that feed.
    LOL! Yes, that's the same feed you were helping with some time back. And, yes, Emilio and Andy both at ARC have been saying, literally for "years" (well probably the last year and a half to 2 years?) that the feed is going to be cleaned up, or has been cleaned up, or will undergo a major change, etc., etc.

    But so far, nothing's happened with it. It's still the same 700 or so items with really poorly structured Names and Descriptions, generally containing sprinkles of non-ASCII characters followed by double apostrophes or double question marks.

    I really love the merchant, in fact my first significant affiliate sale was a $98 commission from an IBS sale out of the blue.

    But it's truly frustrating that nothing's ever yet been done to fix up the feed.

    Ah, well.
    Generate more fake news.

  20. #20
    Newbie
    Join Date
    June 7th, 2006
    Posts
    4
    Unhappy How to remove non-ascii characters from a sharasale datafeed
    I am trying to clean Nature Hills pipe delimited data feed of these non printing characters. Do I have to run the PHP script on the whole file or just on the columns in the file?

    Quote Originally Posted by Snib
    Just have your PHP code strip the non-ASCII characters before you import:
    How to I begin to do this?

    Code:
    $prod_name = preg_replace('/[^(\x20-\x7F)]*/','', $prod_name);
    How do I use this?

    Thanks in advance for the details for dummies.

  21. #21
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitln, Edo. de Mxico
    Posts
    1,725
    When your PHP script imports a datafeed, use that line replacing the variable in Snib's example with the one your script uses.

  22. #22
    Newbie
    Join Date
    June 7th, 2006
    Posts
    4
    I want to clean up the datafeed on my desktop before I import. Is that possible?

  23. #23
    Visual Artist & ABW Ambassador lostdeviant's Avatar
    Join Date
    September 7th, 2007
    Location
    Cuautitln, Edo. de Mxico
    Posts
    1,725
    sure you could. Find & Replace. :-)

  24. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Good datafeed bad datafeed
    By AffiliateBuddha in forum Midnight Cafe'
    Replies: 11
    Last Post: July 26th, 2007, 12:05 AM
  2. MySQL + English Characters + Mandarin Characters
    By popdawg in forum Programming / Datafeeds / Tools
    Replies: 2
    Last Post: October 8th, 2006, 01:05 PM
  3. Automated Categories - Pipe characters on Datafeed?
    By happysausage in forum WebMerge (Fourthworld.com)
    Replies: 6
    Last Post: November 28th, 2003, 04:23 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •