Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I'm trying to convert an array with values in brazilian portuguese to JSON.

Here is an array example:

array(1) {
  ["title"]=>
  string(77) "Cartão Credicard Universitário Visa Crédito "
}

If I use mb_detect_encoding it shows that all values and keys are either in ASCII or UTF8.

However if I try to use json_encodein order to generate the json, it returns a false and json_last_error function says that the error is JSON_ERROR_UTF8

But if I apply first the utf8_encode_deep function to the array ( http://php.net/manual/es/function.utf8-encode.php ), the json is generated without giving any errors.

The problem with this solution is that it returns certain words with bad codification.

Example:

Word before applying utf8_encode: Cartão (good codification)

Word after applying utf8_encode: Cartão (bad codification)

So although it generates the JSON, it doesn't solve my problem because it messes up the words.

Here is the code I'm using:

try {
  $dbh = new PDO("mysql:host=$hostname;dbname=$database;", $username, $password);
  $sql = "SELECT title FROM card";
  $stmt = $dbh->query($sql);

  $result = $stmt->fetch(PDO::FETCH_ASSOC);
  $json = $json_encode($result);
  $error = json_last_error();

  var_dump($json, $error === JSON_ERROR_UTF8);
} catch (PDOException $e) {
        echo 'Connection failed: ' . $e->getMessage() . '\n';
}

If I try to connect to the database using charset=utf8 or charset=utf8mb4, it retrieves Cartão(bad codification), instead of Cartão (good codification)

I have also tried to use JSON_UNESCAPED_UNICODE as parameter of json_encode, but the result remains the same as without using this parameter.

Any suggestions?

UPDATE: I've simplified the example with one concrete case where this problem is happening.

UPDATE 2: Added some code in order to clarify the example, also added some explanations about possible solutions in the comments.

share|improve this question

closed as off-topic by deceze, Sergiu Paraschiv, Manuel, Raul Rene, Pinal Jul 10 '14 at 12:45

This question appears to be off-topic. The users who voted to close gave this specific reason:

  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – deceze, Manuel, Pinal
If this question can be reworded to fit the rules in the help center, please edit the question.

    
Well, where are the values coming from? Can you narrow it down to one specific value that's causing the issue? Once you've narrowed it down, do bin2hex($value) on that value to see its bytes. Check an encoding table if those bytes are correct for UTF-8 for the characters you expect. –  deceze Jul 10 '14 at 7:35
    
The values are coming from a mysql query where the database and table character set are utf8 and collation is utf8_general_ci. The specific problem seems to happen only with the vocals with tildes: en.wikipedia.org/wiki/Tilde (as in the example shown in my question) –  rfc1484 Jul 10 '14 at 7:43
    
    
    
Definitely a duplicate of stackoverflow.com/questions/279170/utf-8-all-the-way-through –  deceze Jul 10 '14 at 8:40

1 Answer 1

up vote 1 down vote accepted

"If I try to connect to the database using charset=utf8 or charset=utf8mb4, it retrieves Cartão(bad codification), instead of Cartão (good codification)"

You are using latin1 as the display encoding, so that UTF-8 encoded, correct, text is displayed incorrectly.

Add charset=utf8 to the connection string and also set the response charset to UTF-8:

header('Content-Type: text/html;charset=utf-8');
share|improve this answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.