Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

So I'm very green with Python and am trying to learn by replicating some matlab code I've written. I have a part where, in matlab, I load a data file that's tab-delimited. The syntax

x = load(data.txt)

Takes the tab delimited data and put them into cells of a matrix labeled x.

Is there a way to do this in python, but with comma-delimited data?

share|improve this question

5 Answers 5

up vote 15 down vote accepted

There are several methods, choose one that is most suitable for your application.

If you are working with numpy, it may be a good idea to use the numpy's load, loadtxt, fromfile or genfromtxt functions, because your file will be loaded into a suitable structure, after the preprocessing.

But if you are not about to work with numpy (or any other big library which has some file loading functionalities), it would be an overkill using it just for loading a file ... Consider using built-in python functions, or the csv module from the standard library instead ... It will be much more flexible, and way smoother.

Here is how, with examples using file.txt (values of each rows are separated with tabs):

1   2   3   4
7   8   9   10  11  12
13  14  15

python built-in

No module to import, pretty easy, flexible, a good option for most situations, imho.

Loading the file in binary mode for reading (rb flags) in a table (list of lists of values, separated in the file with tabs) with only built-in functions:

>>> file = open('file.txt', 'rb')
>>> table = [row.strip().split('\t') for row in file]

csv

The csv module from the standard library is pretty straightforward as well.

Note that altough CSV means Comma Separated Values, there is actually no standard and you can choose any delimiter you want. Therefore CSV stands for all cells-oriented or table-like files.

Loading the file in binary mode for reading (rb flags) in a table (list of lists of values, separated in the file with tabs) with the csv reader:

>>> import csv
>>> file = open('file.txt', 'rb')
>>> data = csv.reader(file, delimiter='\t')
>>> table = [row for row in data]

Accessing the cells

The table has been loaded similarly with the two previous examples, and the data of the table can be accessed like table[row][col]:

>>> table
[['1', '2', '3', '4'], ['7', '8', '9', '10', '11', '12'], ['13', '14', '15']]    
>>> table[0]
['1', '2', '3', '4']
>>> table[1][2]
9
share|improve this answer

There is a csv module in the standard library.

See the documentation here

>>> import csv
>>> spamReader = csv.reader(open('eggs.csv', 'rb'), delimiter=' ', quotechar='|')
>>> for row in spamReader:
...     print ', '.join(row)
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam
share|improve this answer
2  
Just a note, but the fact that the source file must be opened in binary mode is something that I feel should be explicitly pointed out (rather than allowing someone to think either binary or text mode is fine); you can get newline-related errors if you don't do so. –  JAB Jun 7 '12 at 18:51

If you're using Python for MATLAB-like purposes you're going to want to be using NumPy (and scipy); in particular, you should read NumPy for MATLAB Users.

If you have comma-delimited data, you can use numpy.loadtxt to read it (after installing numpy, of course):

$ cat matrix.csv 
1,2,3
4,5,6
7,8,9

and then

>>> import numpy as np
>>> m = np.loadtxt("matrix.csv", delimiter=",")
>>> m
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])
>>> np.matrix(m)
matrix([[ 1.,  2.,  3.],
        [ 4.,  5.,  6.],
        [ 7.,  8.,  9.]])
share|improve this answer

See the csv module (specifically the reader class) and/or numpy.loadtxt function.

share|improve this answer

The numpy.loadtxt function reads data from an ASCII file into a numpy array. The string used to separate values can be defined with the delimiter argument:

numpy.loadtxt('data.txt', delimiter=',')

For more complicated cases, the numpy.genfromtxt function is a very good alternative.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.