Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I know this has been discussed several times but yet I'm getting crazy dealing with this problem. I have a form with a submit.php action. At first I didn't change anything about the charsets, I didn't use any utf8 header information.. The result was that I could read all the ä,ö,ü etc correctly inside the database. Now exporting them to .csv and importing them to Excel as UTF-8 charset (also tested all the others) results in an incorrect charset.

Now what I tried:

PHP:

header("Content-Type: text/html; charset=utf-8");
$mysqli->set_charset("utf8");

MySQL: I dropped my database and created a new one:

create database db CHARACTER SET utf8 COLLATE utf8_general_ci;
create table ...

I changed my my.cnf and restarted my sql server:

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

[mysql]
default-character-set=utf8

If I connect to my db via bash I receive the following output:

| character_set_client     | utf8                             |
| character_set_connection | utf8                             |
| character_set_database   | utf8                             |
| character_set_filesystem | binary                           |
| character_set_results    | utf8                             |
| character_set_server     | utf8                             |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/mysql/share/charsets/ |

A php test:

var_dump($mysqli->get_charset());

Giving me:

Current character set: utf8 object(stdClass)#3 (8) { ["charset"]=> string(4) "utf8"     ["collation"]=> string(15) "utf8_general_ci" ["dir"]=> string(0) "" ["min_length"]=> int(1) ["max_length"]=> int(3) ["number"]=> int(33) ["state"]=> int(1) ["comment"]=> string(13) "UTF-8 Unicode" }

Now I use:

mysql -uroot -ppw db < require.sql > /tmp/test.csv

require.sql is simply a

select * from table;

And again I'm unable to import it as a csv into Excel no matter if I choose UTF-8 or anything else. It's always giving me some crypto..

Hopefully someone got a hint what might went wrong here..

Cheers

E: TextMate is giving me a correct output so it seems that the conversion actually worked and it's and Excel issue? Using Microsoft Office 2011.

E2: Also tried the same stuff with latin1 - same issue, cannot import special characters into excel without breaking them. Any hint or workaround?

E3: I found a workaround which is working with the Excel Import feature but not with double clicking the .csv.

    iconv -f utf8 -t ISO-8859-1 test.csv > test_ISO.csv

Now I'm able to import the csv into excel using Windows(ANSI). Still annoying to have to use this feature instead of doubleclicking. Also I really don't get why UTF8 isn't working, not even with the import feature, BOM added and the complete database in UTF8.

Comma separation turned out to be a mess as well. 1. Concat_WS works only partly because it's adding a stupid concat_ws(..) header to the .csv file. Also "file test.csv" doesn't give me a "comma separated". This means even tho everything is separated by commas Excel won't notice it using double click. 2. sed/awk: Found some code snippets but all of them were separating the table very badly. E.g. colum street "streetname number" remained a 'streetname','number' which made 2 colums out of one and the table was screwed.

So it seems to me that Excel can only open .csv with a double click which a) Are encoded with ISO-8859-1 (and only under windows because standard mac charset is Macintosh) b) File having the attribute "comma separated". This means if I create a .csv through Excel itself the output of

file test1.csv

would be

test1.csv: ISO-8859 text, with CRLF line terminators

while a iconv changed charset with RegEx used for adding commas would look like:

test1.csv: ISO-8859 text

Pretty weird behaviour - maybe someone got a working solution.

share|improve this question
1  
Excel notoriously sucks at encodings! Have you opened the CSV file in a text editor to see if it's actually UTF-8? If that's correct, use the various manual import options in Excel to import the data properly. Otherwise there's a good chance Excel is the only problem. – deceze Jan 28 at 17:01
OpenOffice also giving me the correct Output selecting UTF-8. In Excel I have a bunch of options to select. I used Unicode 5.1 (UTF-8) but also tried everything else.. Since my customer is using Excel I want my .csv to display correctly.. pretty annoying – gulty Jan 28 at 17:04
1  
Try adding a BOM to the file (google it). Again, Excel notoriously sucks at encodings. If possible, convert the data to .xls using a good 3rd party converter. Apple's Numbers is pretty good there for starters, OOo will do too. – deceze Jan 28 at 17:05
Thanks for your effort, adding BOM doesn't make any difference tho.. – gulty Jan 28 at 17:49

2 Answers

That's how I save the data taken from utf-8 mysql tables. You need to add BOM first. Example:

<?php
$fp = fopen(dirname(__FILE__).'/'.$filename, 'wb'); 
fputs($fp, "\xEF\xBB\xBF"); 
fputcsv($fp, array($utfstr_1,$utfstr_2);
fclose($fp);

Make sure that you also tells MySQL you're gonna use UTF-8

mysql_query("SET CHARACTER SET utf8"); 
mysql_query("SET NAMES utf8"); 

You need to execute this before you're selecting any data.

Propaply won't be bad if you set the locale:setlocale(LC_ALL, "en_US.UTF-8");

Hope it helps.

share|improve this answer
Sorry but it is not working. I converted the .csv from utf8 to ISO-8859-1. I'm on a Mac and if I import it as Windows(ANSI) it seems to be working. Not by double clicking the file tho.. Also comma separations isn't recognized. Also I used: mysql -uroot db_name -e 'select * from table' > name.csv to export everything. Tried several Regex expressions for separation but at all nothing Worked. Concat_WS added itself to the header. awk converting ALL whitespaces and the sed regex always cutting my table at "t" letters. Since I got whitespaces inside colums (streetname nr) nothing is really workin – gulty Jan 31 at 16:04
I can send you csv file which I did using php, saving UFT-8 data from MySQL table. Just to make sure that it's MacOS Offise related issue. – KennyPowers Jan 31 at 16:12
Would be lovely, maybe u can upload it since I dunno about sharing emails here :P ty in advance – gulty Jan 31 at 16:47
with double click: s1.directupload.net/images/130131/n374z64p.png as u can see its tab separated. with import utf-8: s1.directupload.net/images/130131/sm5xlwm6.png – gulty Jan 31 at 20:43
show 1 more comment
up vote 0 down vote accepted

Thanks everyone for the help, I finally managed to get a working - double clickable csv file which opens separated and displaying the letter correctly. For those who are interested in a good workflow here we go:

1.) My database is completely using UTF8. 2.) I export a form into my database via php. I'm using mysqli and as header information:

header("Content-Type: text/html; charset=ISO-8859");

I know this makes everything look crappy inside the database, feel free to use utf8 to make it look correctly but it doesn't matter in my case.

3.) I wrote a script executed by a cron daemon which a) removes the .csv files which were created previously

rm -f path/to/csv ##I have 3 due to some renaming see below

b) creating the new csv using mysql (this is still UTF8)

mysql -hSERVERIP -uUSER -pPASS DBNAME -e "select * from DBTABLE;" > PATH/TO/output.csv

Now you have a tab separated .csv and (if u exported from PHP in UTF8) it will display correctly in OpenOffice etc. but not in Excel. Even an import as UTF8 isn't working.

c) Making the file SEMICOLON separated (Excel standard, double clicking a comma separated file won't work at least not with the european version of Excel). I used a small python script semicolon.py:

import sys
import csv

tabin = csv.reader(sys.stdin, dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, delimiter=";")
for row in tabin:
  commaout.writerow(row)

d) Now I had to call the script inside my cron sh file:

/usr/bin/python PATH/TO/semicolon.py < output.csv > output_semi.csv

Make sure you use the full path for every file if u use the script as cron.

e) Change the charset from UTF8 to ISO-8859-1 (Windows ANSI Excel standard) with iconv:

iconv -f utf8 -t ISO-8859-1 output_semi.csv > output_final.csv

And that's it. csv opens up on double click on Mac/Windows Excel 2010 (tested).

Maybe this is a help for someone with similar problems. It drove me crazy.

Edit: For some servers you don't need iconv because the output from the database is already ISO8859. You should check your csv after executing the mysql command:

file output.csv

Use iconv only if the charset isn't iso8859-1

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.