I am dealing with a third-party XML which contains special characters like bullets, long dashes etc.
Sample XML:
$xml = "<xml><node>• Special Characters</node></xml>";
My goal is to parse this XML using PHP and insert it in a MySQL database. I am using DomDocument
to parse the XML to get a SimpleXMLElement
object from the DOM node using simplexml_import_dom
.
The load method of DomDocument
fails unless I use utf8_encode to encode the xml.
$doc = new DOMDocument();
$doc->loadXML(utf8_encode($xml));
To be able to parse the xml, I understand that I need the utf8_encode
function. After being able to parse the XML, the inserts in MySQL table will result in special characters appearing as ? or garbage. Even the special characters from XML if displayed on a browser after parsing will be garbage.
The MySQL table column is of text datatype and is in latin1_swedish_ci collation. I saw similar questions on SO and tried their solutions like running mysql_query('SET NAMES utf8')
or changing the column encoding but they didn't work for me.
Please advise.
$xml
come from and whar makes you think you needutf8_encode()
? – Pekka 웃 Mar 23 '12 at 23:13utf8_general_ci
or something? – hohner Mar 23 '12 at 23:14