Does someone have an idea on where I could find some javascript code to parse CSV data ?

share|improve this question
What is your CSV data source? – rahul Aug 18 '09 at 10:56
My CSV is in a textarea so the solution of using CSVToArray() is working fine. – Pilooz Aug 18 '09 at 13:14
1  
Take a look at this answer here, it has good answers: stackoverflow.com/questions/8493195/… – Dobes Vandermeer Feb 8 '12 at 8:17
@Pilooz - seems you have enough points to accept answers? – mplungjan May 16 '12 at 11:40
2  
Most of the answers below are just wrong, aside from the one by Andy. Any answer that uses pattern matching or splits is doomed to fail --they will not support escape sequences. For that, you need a finite state machine. – greg.kindel Jan 17 at 22:33
show 1 more comment
feedback

6 Answers

up vote 67 down vote accepted

You can use the CSVToArray() function mentioned in this blog entry.

<script type="text/javascript">

    // This will parse a delimited string into an array of
    // arrays. The default delimiter is the comma, but this
    // can be overriden in the second argument.
    function CSVToArray( strData, strDelimiter ){
    	// Check to see if the delimiter is defined. If not,
    	// then default to comma.
    	strDelimiter = (strDelimiter || ",");

    	// Create a regular expression to parse the CSV values.
    	var objPattern = new RegExp(
    		(
    			// Delimiters.
    			"(\\" + strDelimiter + "|\\r?\\n|\\r|^)" +

    			// Quoted fields.
    			"(?:\"([^\"]*(?:\"\"[^\"]*)*)\"|" +

    			// Standard fields.
    			"([^\"\\" + strDelimiter + "\\r\\n]*))"
    		),
    		"gi"
    		);


    	// Create an array to hold our data. Give the array
    	// a default empty first row.
    	var arrData = [[]];

    	// Create an array to hold our individual pattern
    	// matching groups.
    	var arrMatches = null;


    	// Keep looping over the regular expression matches
    	// until we can no longer find a match.
    	while (arrMatches = objPattern.exec( strData )){

    		// Get the delimiter that was found.
    		var strMatchedDelimiter = arrMatches[ 1 ];

    		// Check to see if the given delimiter has a length
    		// (is not the start of string) and if it matches
    		// field delimiter. If id does not, then we know
    		// that this delimiter is a row delimiter.
    		if (
    			strMatchedDelimiter.length &&
    			(strMatchedDelimiter != strDelimiter)
    			){

    			// Since we have reached a new row of data,
    			// add an empty row to our data array.
    			arrData.push( [] );

    		}


    		// Now that we have our delimiter out of the way,
    		// let's check to see which kind of value we
    		// captured (quoted or unquoted).
    		if (arrMatches[ 2 ]){

    			// We found a quoted value. When we capture
    			// this value, unescape any double quotes.
    			var strMatchedValue = arrMatches[ 2 ].replace(
    				new RegExp( "\"\"", "g" ),
    				"\""
    				);

    		} else {

    			// We found a non-quoted value.
    			var strMatchedValue = arrMatches[ 3 ];

    		}


    		// Now that we have our value string, let's add
    		// it to the data array.
    		arrData[ arrData.length - 1 ].push( strMatchedValue );
    	}

    	// Return the parsed data.
    	return( arrData );
    }

</script>
share|improve this answer
Sure ! It's cool. Thanks for this. – Pilooz Aug 18 '09 at 11:05
This can handle embedded commas, quotes and line breaks, eg.: var csv = 'id, value\n1, James\n02,"Jimmy Smith, Esq."\n003,"James ""Jimmy"" Smith, III"\n0004,"James\nSmith\nWuz Here"' var array = CSVToArray(csv, ","); – user645715 Jun 6 '12 at 20:27
+1 BTW, for a short non-library-based solution, this is the best implementation I have seen yet (and I have seen dozens/hundreds). – Evan Plaice Oct 6 '12 at 0:47
This has worked for a wide variety of files. I find that it appends an extra blank row at the end. Not sure if this is the best way to but suggest adding a check that the new line has at least one value, even if explicitly blank "" ` if (arrMatches[2] || arrMatches[3]) { arrData.push([]); rowCount++; } else { break; } }` – user645715 Oct 30 '12 at 3:00
1  
It gives undefined for empty fields that is quoted. Example: CSVToArray("4,,6") gives me [["4","","6"]], but CSVToArray("4,\"\",6") gives me [["4",undefined,"6"]]. – Pang Nov 14 '12 at 4:36
show 3 more comments
feedback

I think I can sufficiently beat Kirtan's answer

Enter jQuery-CSV

It's a jquery plugin designed to work as an end-to-end solution for parsing CSV into Javascript data. It handles every single edge case presented in RFC 4180, as well as some that pop up for Excel/Google Spreadsheed exports (ie mostly involving null values) that the spec is missing.

Example:

track,artist,album,year

Dangerous,'Busta Rhymes','When Disaster Strikes',1997

// calling this
music = $.csv.toArrays(csv)

// outputs...
[
  ["track","artist","album","year"],
  ["Dangerous","Busta Rhymes","When Disaster Strikes","1997"]
]

console.log(music[1][4]) // outputs: 'Busta Rhymes'

Update:

Oh yeah, I should also probably mention that it's completely configurable.

music = $.csv.toArrays(csv, {
  delimiter:"'", // sets a custom value delimiter character
  separator:';', // sets a custom field separator character
});

Update 2:

It now works with jQuery on Node.js too. So you have the option of doing either client-side or server-side parsing with the same lib.

Disclaimer: I am also the author of jQuery-CSV.

share|improve this answer
1  
Why is it jQuery csv? Why does it depend on jQuery? I've had a quick scan through the source... it doesn't look like you're using jQuery – paulslater19 May 10 '12 at 8:20
This is great. Would be useful to extend this to handle embedded line breaks or escaped double quotes, e.g. "James ""Jimmy"" Smith" or "Embedded\nLine\nBreaks" – user645715 Jun 6 '12 at 20:18
@user645715 It handles both. If you take a look at the test runner (jquery-csv.googlecode.com/git/test/test.html) it outlines all of the edge cases that the plugin covers. – Evan Plaice Nov 19 '12 at 20:28
@paulslater19 The plugin doesn't depend on jquery. Rather, it follows the common jQuery development guidelines. All of the methods included are static and reside under their own namespace (ie $.csv). To use them without jQuery simply create a global $ object that the plugin will bind to during initialization. – Evan Plaice Nov 19 '12 at 20:48
1  
@EvanPlaice you are the MASTER!!! +99 if i can :) – bouncingHippo Jan 30 at 19:21
show 4 more comments
feedback

I have an implementation as part of a spreadsheet project.

This code is not yet tested thoroughly, but anyone is welcome to use it.

As some of the answers noted though, your implementation can be much simpler if you actually have DSV or TSV file, as they disallow the use of the record and field separators in the values. CSV, on the other hand can actually have commas and newlines inside a field, which breaks most regex and split-based approaches.

var CSV = {
parse: function(csv, reviver) {
    reviver = reviver || function(r, c, v) { return v; };
    var chars = csv.split(''), c = 0, cc = chars.length, start, end, table = [], row;
    while (c < cc) {
        table.push(row = []);
        while (c < cc && '\r' !== chars[c] && '\n' !== chars[c]) {
            start = end = c;
            if ('"' === chars[c]){
                start = end = ++c;
                while (c < cc) {
                    if ('"' === chars[c]) {
                        if ('"' !== chars[c+1]) { break; }
                        else { chars[++c] = ''; } // unescape ""
                    }
                    end = ++c;
                }
                if ('"' === chars[c]) { ++c; }
                while (c < cc && '\r' !== chars[c] && '\n' !== chars[c] && ',' !== chars[c]) { ++c; }
            } else {
                while (c < cc && '\r' !== chars[c] && '\n' !== chars[c] && ',' !== chars[c]) { end = ++c; }
            }
            row.push(reviver(table.length-1, row.length, chars.slice(start, end).join('')));
            if (',' === chars[c]) { ++c; }
        }
        if ('\r' === chars[c]) { ++c; }
        if ('\n' === chars[c]) { ++c; }
    }
    return table;
},

stringify: function(table, replacer) {
    replacer = replacer || function(r, c, v) { return v; };
    var csv = '', c, cc, r, rr = table.length, cell;
    for (r = 0; r < rr; ++r) {
        if (r) { csv += '\r\n'; }
        for (c = 0, cc = table[r].length; c < cc; ++c) {
            if (c) { csv += ','; }
            cell = replacer(r, c, table[r][c]);
            if (/[,\r\n"]/.test(cell)) { cell = '"' + cell.replace(/"/g, '""') + '"'; }
            csv += (cell || 0 === cell) ? cell : '';
        }
    }
    return csv;
}
};
share|improve this answer
This is one of my favorite answers. It's a real parser implemented in not a lot of code. – Trevor Dixon Dec 20 '12 at 7:15
feedback

Why not just use .split(',') ?

http://www.w3schools.com/jsref/jsref_split.asp

var str="How are you doing today?";
var n=str.split(" "); 
share|improve this answer
Why is this a bad answer? It is native, places string content into workable array... – Micah Sep 26 '12 at 15:41
6  
Lots of reasons. First, it doesn't remove the double quotes on delimited values. Doesn't handle line splitting. Doesn't escape double-double quotes used to escape double quotes used in delimited values. Doesn't allow empty values. etc, etc... The flexibility of the CSV format makes it very easy to use but difficult to parse. I won't downvote this but only because I don't downvote competing answers. – Evan Plaice Oct 6 '12 at 0:51
Using split can be used to break out line by line, and to break out the line values. Seems like a simple solution, but then again, it is a small code and if used wisely, very powerful. – Micah Oct 16 '12 at 20:06
1  
What about when you encounter a value that contains a newline char? A simple split function will incorrectly interpret it as the end of an entry instead of skipping over it like it should. Parsing CSV is a lot more complicated than just providing 2 split routines (one for newlines, one for delimiters). – Evan Plaice Oct 16 '12 at 21:40
(cont) Also split on null values (a,null,,value) returns nothing whereas it should return an empty string. Don't get me wrong, split is a good start if you are 100% positive that the incoming data won't break the parser but creating a robust parser that can handle any data that is RFC 4801 compliant is significantly more complicated. – Evan Plaice Oct 16 '12 at 21:45
feedback

Here's my PEG(.js) grammar that seems to do ok at RFC 4180 (i.e. it handles the examples at http://en.wikipedia.org/wiki/Comma-separated_values):

start
  = [\n\r]* first:line rest:([\n\r]+ data:line { return data; })* [\n\r]* { rest.unshift(first); return rest; }

line
  = first:field rest:("," text:field { return text; })*
    & { return !!first || rest.length; } // ignore blank lines
    { rest.unshift(first); return rest; }

field
  = '"' text:char* '"' { return text.join(''); }
  / text:[^\n\r,]* { return text.join(''); }

char
  = '"' '"' { return '"'; }
  / [^"]

Try it out at http://jsfiddle.net/knvzk/10 or http://pegjs.majda.cz/online. Download the generated parser at https://gist.github.com/3362830.

share|improve this answer
PEG? Isn't building an AST a little memory heavy for a Type III grammar. Can it handle fields that contain newline chars because that's the most difficult case to cover in a 'regular grammar' parser. Either way, +1 for a novel approach. – Evan Plaice Jan 31 at 1:37
Yes, it handles newline inside a field. – Trevor Dixon Jan 31 at 3:52
Nice... With that alone, it's better than 95% of all the implementations I have ever seen. If you want to check for full RFC compliance, take a look at the tests here (jquery-csv.googlecode.com/git/test/test.html). – Evan Plaice Jan 31 at 18:24
feedback

Im not sure why I couldn't kirtans ex. to work for me. It seemed to be failing on empty fields or maybe fields with trailing commas...

This one seems to handle both.

I did not write the parser code, just a wrapper around the parser function to make this work for a file. see Attribution

    var Strings = {
        /**
         * Wrapped csv line parser
         * @param s string delimited csv string
         * @param sep separator override
         * @attribution : http://www.greywyvern.com/?post=258 (comments closed on blog :( )
         */
        parseCSV : function(s,sep) {
            // http://stackoverflow.com/questions/1155678/javascript-string-newline-character
            var universalNewline = /\r\n|\r|\n/g;
            var a = s.split(universalNewline);
            for(var i in a){
                for (var f = a[i].split(sep = sep || ","), x = f.length - 1, tl; x >= 0; x--) {
                    if (f[x].replace(/"\s+$/, '"').charAt(f[x].length - 1) == '"') {
                        if ((tl = f[x].replace(/^\s+"/, '"')).length > 1 && tl.charAt(0) == '"') {
                            f[x] = f[x].replace(/^\s*"|"\s*$/g, '').replace(/""/g, '"');
                          } else if (x) {
                        f.splice(x - 1, 2, [f[x - 1], f[x]].join(sep));
                      } else f = f.shift().split(sep).concat(f);
                    } else f[x].replace(/""/g, '"');
                  } a[i] = f;
        }
        return a;
        }
    }
share|improve this answer
feedback

Your Answer

 
or
required, but never shown
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.