Maybe I'm just a moron, but there's something very wrong with the amazon XML download feeds. They say at the top that they are UTF-8 encoded, but it's not true. They contain non-UTF-8 html entities, which when passed through a strict parser garbles the entity to ...well, take your pick!

the e acute symble in amazon's feeds are represented as

yet when passed through a parser garbles the text to A~(copyright) whatnot...
it requires a special function to encode all html entities to utf-8 so as the parser doesn't garble it.

If the feed contains non-UTF8 characters, then the feed should state it is UTF-8

apologies, the é is not an html entitity (&eacute, but the hex equivalent. The point is the same tho