up vote 0 down vote favorite
3

I am having trouble with PHP regarding encoding.

I have a JavaScript/jQuery HTML5 page interact with my PHP script using $.post. However, PHP is facing a weird problem, probably related to encoding.

When I write

htmlentities("í")

I expect PHP to output í. However, instead it outputs í At the beginning, I thought that I was making some mistake with the encodings, however

htmlentities("í")=="í"?"Good":"Fail";

is outputing "Fail", where

htmlentities("í")=="í"?"Good":"Fail";

But htmlentities($search, null, "utf-8") works as expected.

I want to have PHP communicate with a MySQL server, but it has encoding problems too, even if I use utf8_encode. What should I do?

EDIT: On the SQL command, writing

SELECT id,uid,type,value FROM users,profile
WHERE uid=id AND type='name' AND value='XXX';

where XXX contains no í chars, works as expected, but it does not if there is any 'í' char.

SET NAMES 'utf8';
SET CHARACTER SET 'utf8';
SELECT id,uid,type,value FROM users,profile
WHERE uid=id AND type='name' AND value='XXX';

Not only fails for í chars, but it ALSO fails for strings without any 'special' characters. Removing the ' chars from SET NAMES and SET CHARACTER SET doesn't seem to change anything.

I am connecting to the MySQL database using PDO.

EDIT 2: I am using MySQL version 5.1.30 of XAMPP for Linux.

EDIT 3: Running SHOW VARIABLES LIKE '%character%' from PhpMyAdmin outputs

character_set_client    utf8
character_set_connection    utf8
character_set_database  latin1
character_set_filesystem    binary
character_set_results   utf8
character_set_server    latin1
character_set_system    utf8
character_sets_dir  /opt/lampp/share/mysql/charsets/

Running the same query from my PHP script(with print_r) outputs:

Array
(
    [0] => Array
        (
            [Variable_name] => character_set_client
            [0] => character_set_client
            [Value] => latin1
            [1] => latin1
        )

    [1] => Array
        (
            [Variable_name] => character_set_connection
            [0] => character_set_connection
            [Value] => latin1
            [1] => latin1
        )

    [2] => Array
        (
            [Variable_name] => character_set_database
            [0] => character_set_database
            [Value] => latin1
            [1] => latin1
        )

    [3] => Array
        (
            [Variable_name] => character_set_filesystem
            [0] => character_set_filesystem
            [Value] => binary
            [1] => binary
        )

    [4] => Array
        (
            [Variable_name] => character_set_results
            [0] => character_set_results
            [Value] => latin1
            [1] => latin1
        )

    [5] => Array
        (
            [Variable_name] => character_set_server
            [0] => character_set_server
            [Value] => latin1
            [1] => latin1
        )

    [6] => Array
        (
            [Variable_name] => character_set_system
            [0] => character_set_system
            [Value] => utf8
            [1] => utf8
        )

    [7] => Array
        (
            [Variable_name] => character_sets_dir
            [0] => character_sets_dir
            [Value] => /opt/lampp/share/mysql/charsets/
            [1] => /opt/lampp/share/mysql/charsets/
        )

)

Running

SET NAMES 'utf8';
SET CHARACTER SET 'utf8';
SHOW VARIABLES LIKE '%character%'

outputs an empty array.

flag

1 Answer

up vote 6 down vote

It's very important to specify the encoding of htmlentities to match that of the input, as you did in your final example but omitted in the first three.

htmlentities($text,ENT_COMPAT,'utf-8');

Regarding communications with MySQL, you need to make sure the connection collation and character set matches the data you are transmitting. You can either set this in the configuration file, or at runtime using the following queries:

SET NAMES utf8;
SET CHARACTER SET utf8;

Make sure the table, database and server character sets match as well. There is one setting you can't change at run-time, and that's the server's character set. You need to modify it in the configuration file:

[mysqld]
character-set-server = utf8
default-character-set = utf8 
skip-character-set-client-handshake

Read more on characters sets and collations in MySQL in the manual.

link|flag
PhpMyAdmin says that the field I am trying to get is encoded using utf8_bin, and I thought that'd be enough. I'll try your solution, though. – luiscubal Jan 1 '09 at 23:51
The field is encoded in UTF, but you need to make sure the connection is using the same encoding (for some reason the default is ISO-8859) – Eran Galperin Jan 1 '09 at 23:54
Thank you. But it's still not working. I've updated my question and added further details. – luiscubal Jan 2 '09 at 0:02
What version of MySQL are you using? – Eran Galperin Jan 2 '09 at 0:18
I'm using 5.1.30 (XAMPP for Linux) /opt/lampp/bin/mysql --version /opt/lampp/bin/mysql Ver 14.14 Distrib 5.1.30, for pc-linux-gnu (i686) using EditLine wrapper I'll add this information to the post. – luiscubal Jan 2 '09 at 0:24
show 17 more comments

Your Answer

get an OpenID
or
never shown

Not the answer you're looking for? Browse other questions tagged or ask your own question.