Unix & Linux Stack Exchange is a question and answer site for users of Linux, FreeBSD and other Un*x-like operating systems. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I use the xls2csv binary in order to translate XLS doc to CSV in my linux red-hat machine,

example: ( from man page )

 xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile.csv" -a UTF-8

But I noticed about the following problems - steps 1,2 ( the following problems cause many troubles in my bash script )

THE PROBLEMS ARE:

(1) CSV file include unnecessary spaces (on the left side of word or on the right side of word)

Example of wrong syntax in CSV

 ,"/var/adm/sys ldd/all  /Comm/logs   ","WORD "," WORD"

Example of right syntax in csv

 ,"/var/adm/sys ldd/all  /Comm/logs",WORD,WORD

(2) quotation marks appears in the CSV even if word is (one word between separators), in fact we not need quotation marks in case of SINGLE word between the separators ( separator “,” )

Example of wrong syntax in CSV

 ," WORD ",

Example of right syntax in csv

 ,WORD,

Please advice how to solve the problems as described here in order to create "clean csv file" according to steps 1,2

implementation could be with awk,sed ,perl one liner, or any solution under bash script

Example of CSV file before the fix

 1,"/var/adm/sys ldd/all  /Comm/logs",34356,"234245 ",24245
 2,"/var/adm/sys ldd/all
 /Comm/debugs.txt"," 45356",435,"  578 58976  "
 3,"   add this line in crontab    :",34356,"234245 ",24245
 4,"1.0348    54 35.5"," 45356","   435","578 "
 4,"1 2 "," 45356 95857 ","   435","578 "
 5,"1 2 "," 45356 95857 ","   "435","578" "
 6,"1.0348    54 35.5"," 45356"," "4"""    ""35","578 "
 7,"1.0348    54 35.5",""45356",""4"""""35,"578 "

Example of correct CSV file ( after the fix )

 1,"/var/adm/sys ldd/all  /Comm/logs",34356,234245,24245
 2,"/var/adm/sys ldd/all
 /Comm/debugs.txt",45356,435,"578 58976"
 3,"add this line in crontab    :",34356,234245,24245
 4,"1.0348    54 35.5",45356,435,578 
 4,"1 2","45356 95857",435,578
 5,"1 2","45356 95857","435,578" 
 6,"1.0348    54 35.5",45356,"4"""    ""35,578
 7,"1.0348    54 35.5",""45356",""4"""""35,578

Commas cannot appear within fields.

Note the explicit newline contained within a field of line 2.

When a field is within double quotes and contains no white space (e.g. line 7 ""45356"), those double quotes must not be removed because that whole field including those quotes is an encoded password

share|improve this question
    
Probably not the solution to your problem, but I think Excel does a better job exporting xls to CSV, did you tried that? – RSFalcon7 Jul 28 '14 at 16:11
    
I use the xls2csv in my bash script , this is the only way to translete XLS to CSV from shell script ( I cant do manual from XLS doc ) the concept here is to automate the proccess – maihabunash Jul 28 '14 at 16:14
    
the problem or the source of the problem here is because the original XLS file have space near the word/s so later I get the space also in CSV , other solution can be with the xls2csv command , but I dont know how to flag this command in order to disable spaces before or after the word , – maihabunash Jul 28 '14 at 16:19
1  
Why can't you accept this syntax on your CSV? – RSFalcon7 Jul 28 '14 at 16:26
4  
I think your problem is that you're processing CSV in bash. You should be using a proper CSV parser that can handle these "problems" you're having. For example, reading the file in a "while read" loop will make it hard to handle embedded newlines (line 2) or lines with commas inside quotes (not present in your example data but likely in the real world). – glenn jackman Jul 28 '14 at 16:38

This perl code produces almost exactly the expected output:

use Text::CSV;

my $csv = Text::CSV->new({ binary => 1, eol => $/, allow_loose_quotes => 1, escape_char => undef });

open my $io, "<", $ARGV[0] or die;

while (my $row = $csv->getline ($io)) {
        my @o = map { $_ =~ s,^\s*,,; $_ =~ s,\s*$,,; $_; } @{$row};
        $csv->print(STDOUT, \@o);
}

output is

1,"/var/adm/sys ldd/all  /Comm/logs",34356,234245,24245
2,"/var/adm/sys ldd/all
/Comm/debugs.txt",45356,435,"578 58976"
3,"add this line in crontab    :",34356,234245,24245
4,"1.0348    54 35.5",45356,435,578
4,"1 2","45356 95857",435,578
5,"1 2","45356 95857",""435","578""
6,"1.0348    54 35.5",45356,""4"""    ""35",578
7,"1.0348    54 35.5",""45356",""4"""""35,"578"
share|improve this answer
1  
Don't actually need map and the intermediate variable here. Can s,(^\s+|\s+$),,g for @$row; instead – Sobrique Feb 10 at 20:29

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.