2

I'm struggling with getting string values into an array in python. I have a file, about 30k entries long, and each row looks like this:

0R1,Sn=0.3M,Sm=0.7M,Sx=1.5M

I don't need the 0R1 part; all I need is all the Sn values in one array, the Sm values in another, and the Sx in another (of course, I haven't figured out how I'm going to get the numerical values out of the string yet, but I'll think about that later). Right now I'm trying to make an array of strings, I suppose.

Here's my code:

fname = '\\pathname...\\WXT51003.txt'
f1 = open(fname, 'r')

import csv
import numpy
from numpy import zeros
reader = csv.reader(f1)
Max = zeros((29697,1), dtype = numpy.str)
Mean = zeros((29697,1), dtype = numpy.str)
Min = zeros((29697,1), dtype = numpy.str)
for i, row in enumerate(reader):
    Min[i] = row[1]
    Mean[i] = row[2]
    Max[i] = row[3]

f1.close()
print Min[0:10]

The output of the print statement is an array with 'S' in every row. How do I get it to read the entire string, and not just the first character?

3
  • use a dtype of "S8" or however big you need your strings... or use dtype.object or whatever ... by default its going to be a len1 string type... or dont use numpy (since you are dealing with strings anyway) Commented Jul 18, 2013 at 21:06
  • @JoranBeasley would that work if different rows have different string lengths? For example, if I have Sn=0.3M, then I have a string length of 7, but if it's Sn=10.1M, then it's a different string length. Commented Jul 18, 2013 at 21:10
  • as lng as you set it to the maximum length (see numpy docs about dtypes) Commented Jul 18, 2013 at 21:11

1 Answer 1

3
reader = csv.reader(f1)
rows = list(reader)
cols = zip(*rows)
Min = cols[1]
Mean = cols[2]
Max = cols[3]


# or if you really want numpy.arrays
Min = numpy.array(cols[1]) #dtype will be auto-assigned
Mean = numpy.array(cols[2]) #dtype will be auto-assigned
Max = numpy.array(cols[3]) #dtype will be auto-assigned

is how I would do it ... (not use numpy for this ... at least not yet)

if you need to use numpy then use a dtype of "S8" or however big you need your strings... or use dtype.object or whatever ... by default its going to be a len1 string type...but really I see no reason to use numpy here based on your code snippet

3
  • 1
    I would stick to OP's naming. Here, you mask the Python built-in min() and max() functions. Commented Jul 18, 2013 at 21:22
  • oops my bad ... (fixed) Commented Jul 18, 2013 at 21:34
  • Nice answer. Optionally you could rewrite to the oneliner _, Min, Mean, Max = zip(*csv.reader(f1)) which works in Python 3. Commented Jul 18, 2013 at 23:33

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.