Results 1 to 5 of 5
  1. #1
    Join Date
    January 18th, 2005
    Identifying "problematic" text strings in datafeeds
    As part of my larger "datafeed project," I am trying to identify common "problems" embedded in datafeeds. I started with a random collection of about 3 dozen datafeeds from ShareASale merchants, and identified 150 "issues" that I'll need to deal with when importing datafeeds. Many of these can be generalized: embedded HTML (including links and image tags), special characters, ampersand-encoded symbols, malformed text or HTML, and improper text.

    I've posted a chart at (note that characters in the first section of my chart won't display properly in most browsers; use "view source" to figure out what these characters are).

    Not all of these are actually "problems" -- especially the "ampersand" encoded characters -- but each represents a decision I'll need to make while importing data.

    Is anyone else willing to share a chart of "datafeed problems" and/or "substitution maps" that they've used? If you do, I'll be glad to share my list as I develop it.

  2. #2
    Outsourced Program Manager Chris -  AMWSO's Avatar
    Join Date
    January 18th, 2005
    Hi Mark

    We reviewed a range of these issues some time back and while I don't have the data any more I do have the resulting software that we developed to help clean up data feeds. (

    One of the biggest challenges we found was from data imported from a clients word documents as they would also import a mass of "invisible" formating characters that cause chaos with feeds.

    We've not worked on the code of this for a fair while but I'm sure I could dig up the code and have it improved if you have an suggestions


    Affiliate Marketing by AMWSO. Skype - chrissanderson ::: TEL 1-720-336-1784 :::
    Join our affiliate programs :Vaper Empire, Iolo, Art of Tea, or See ALL our Programs here

  3. #3
    ABW Ambassador sjangro's Avatar
    Join Date
    January 17th, 2005
    Mark, I used to have deal with some particularly messy feeds with non-printable or extended characters (like curly quotes). I dealt them with some regular expressions.

    Unfortunately, I've lost track of them. But I just went hunting and found this script, which has a few really great lines of code for cleaning up nasties that are very similar to what I used.

  4. #4
    Full Member
    Join Date
    October 22nd, 2006
    I get rid of the majority of these problems with a few lines of php:

    PHP Code:
    $data =utf8_decode($data);
    $data =str_replace ("<li>","<li> - ",$data);
    $data =strip_tags($data);
    $data =html_entity_decode($data);
    $data =htmlentities($data); 
    I'm sure similar functions either already exist or can easily be created in asp.
    The str_replace ("<li>","<li> - ",$data); is included to add a hyphen to replace the html list bullets.

    Both html_entity_decode and htmlentities are included to take into account both feeds that have eg & and those which have &amp;

  5. #5
    Join Date
    January 18th, 2005
    I've posted an updated list of text substitutions for datafeeds (this is my current list of suggestions to my developer):

  6. Newsletter Signup

+ Reply to Thread

Similar Threads

  1. Replies: 0
    Last Post: February 22nd, 2013, 03:29 PM
  2. PM Replies = Show quoted text when hit "reply"
    By Lanny in forum Midnight Cafe'
    Replies: 14
    Last Post: September 20th, 2011, 07:22 AM
  3. "Dora the Explorer" - Grab Your Widgets & Text Links
    By Geno Prussakov in forum AM Navigator
    Replies: 2
    Last Post: February 5th, 2008, 02:38 PM
  4. Replies: 2
    Last Post: July 29th, 2005, 12:51 PM
  5. Datafeeds & Google: bad "news" - Are you a "thin affiliate" ?
    By heisje in forum Programming / Datafeeds / Tools
    Replies: 4
    Last Post: June 29th, 2005, 07:52 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts