I use the xls2csv binary in order to translate XLS doc to CSV in my linux red-hat machine,
example: ( from man page )
xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile.csv" -a UTF-8
But I noticed about the following problems - steps 1,2 ( the following problems cause many troubles in my bash script )
THE PROBLEMS ARE:
(1) CSV file include unnecessary spaces (on the left side of word or on the right side of word)
Example of wrong syntax in CSV
,"/var/adm/sys ldd/all /Comm/logs ","WORD "," WORD"
Example of right syntax in csv
,"/var/adm/sys ldd/all /Comm/logs",WORD,WORD
(2) quotation marks appears in the CSV even if word is (one word between separators), in fact we not need quotation marks in case of SINGLE word between the separators ( separator “,” )
Example of wrong syntax in CSV
," WORD ",
Example of right syntax in csv
,WORD,
Please advice how to solve the problems as described here in order to create "clean csv file" according to steps 1,2
implementation could be with awk,sed ,perl one liner, or any solution under bash script
Example of CSV file before the fix
1,"/var/adm/sys ldd/all /Comm/logs",34356,"234245 ",24245
2,"/var/adm/sys ldd/all
/Comm/debugs.txt"," 45356",435," 578 58976 "
3," add this line in crontab :",34356,"234245 ",24245
4,"1.0348 54 35.5"," 45356"," 435","578 "
4,"1 2 "," 45356 95857 "," 435","578 "
5,"1 2 "," 45356 95857 "," "435","578" "
6,"1.0348 54 35.5"," 45356"," "4""" ""35","578 "
7,"1.0348 54 35.5",""45356",""4"""""35,"578 "
Example of correct CSV file ( after the fix )
1,"/var/adm/sys ldd/all /Comm/logs",34356,234245,24245
2,"/var/adm/sys ldd/all
/Comm/debugs.txt",45356,435,"578 58976"
3,"add this line in crontab :",34356,234245,24245
4,"1.0348 54 35.5",45356,435,578
4,"1 2","45356 95857",435,578
5,"1 2","45356 95857","435,578"
6,"1.0348 54 35.5",45356,"4""" ""35,578
7,"1.0348 54 35.5",""45356",""4"""""35,578
Commas cannot appear within fields.
Note the explicit newline contained within a field of line 2
.
When a field is within double quotes and contains no white space (e.g. line 7 ""45356"
), those double quotes must not be removed because that whole field including those quotes is an encoded password