up vote 0 down vote favorite

Hi,

I'm building a website that fetches text from another page and insert it into the database.

The problem is that all the special characters are saved in the database using the HTML encoding so then I need to convert the output using:

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

I mean, what I have right now is instead of just saving the character " ' " the html version " &#x27; " is saved in the database. This happens also when spanish characters or another special ones are saved. Instead of the letter " ñ " for ejample, I get " &ntilde; " saved.

This wastes space in the database and also I need to later convert the output using content-type so:

How can I just convert or set the charset before is saved or just let MySQL convert it??

In case you need to know here's how I connect to the database:

function dbConnect() {      
    $conn = new mysqli(DB_SERVER, DB_USER, DB_PASSWORD, DB_NAME) or die ('Error.');
    return $conn;
}

    $conn = dbConnect();
    $stmt = $conn->stmt_init();

Hope you can help me!! Thanks.

link|flag

Agree that storing HTML-encoded data (with no actual markup in it) in the database is totally the Wrong Thing (the amount of extra space it takes not really being the important part of that). Text should stay as plain text until the point it needs to be encoded into some other output format. – bobince Apr 19 '09 at 15:32

3 Answers

up vote 1 down vote accepted

You can use html_entity_decode() to convert from HTML to a (real) character encoding.

<? echo html_entity_decode("&ntilde;", ENT_COMPAT, "UTF-8"); ?>
ñ

Please note that "HTML" isn't a character encoding in the usual sense, so isn't understood by libraries such as iconv, nor by MySQL itself.

I'd also recommend (per example above) having the whole application use UTF-8. Single character encodings such as ISO8859 are effectively obsolete now that Unicode is so widely supported.

link|flag
up vote 1 down vote

I suggest using UTF-8 if there are any non-English characters. You can run the SQL

SET NAMES UTF-8

to make your dbase connection in UTF-8 just after you connect to the dbase.

When you do this, you shouldn't use "htmlspecialchars" or "htmlentities" while saving the data.

link|flag
up vote 0 down vote

Maybe you should use htmlspecialchars rather that htmlentities where the first just replaces the HTML special characters &, <, > and " and not every character that can be represented by a named entity character reference like the latter does.

link|flag
Con you explain how to use htmlspecialchars in my case?? – Jonathan Apr 19 '09 at 11:48
Well how do you store the data into the database? Or are you just reading the data from it? – Gumbo Apr 19 '09 at 11:52
htmlspecialchars doesn't help because it's for encoding HTML entities, not decoding them. – Alnitak Apr 19 '09 at 12:07
But don’t encoding them in the first place would avoid this problem. – Gumbo Apr 19 '09 at 12:37

Your Answer

get an OpenID
or
never shown

Not the answer you're looking for? Browse other questions tagged or ask your own question.