Converting a list of strings in a numpy array in a faster way

Question

br is the name of a list of strings that goes like this:

['14 0.000000 -- (long term 0.000000)\n',
 '19 0.000000 -- (long term 0.000000)\n',
 '22 0.000000 -- (long term 0.000000)\n',
...

I am interested in the first two columns, which I would like to convert to a numpy array. So far, I've come up with the following solution:

x = N.array ([0., 0.])
for i in br:
    x = N.vstack ( (x, N.array (map (float, i.split ()[:2]))) )

This results into having a 2-D array:

array([[  0.,   0.],
       [ 14.,   0.],
       [ 19.,   0.],
       [ 22.,   0.],
...

However, since br is rather big (~10^5 entries), this procedure takes some time. I was wondering, is there a way to accomplish the same result, but in less time?

sunetos · Accepted Answer · 2011-08-31 16:47:31Z

up vote 3 down vote accepted

This is dramatically faster for me:

import numpy as N

br = ['14 0.000000 -- (long term 0.000000)\n']*50000
aa = N.zeros((len(br), 2))

for i,line in enumerate(br):
    al, strs = aa[i], line.split(None, 2)[:2]
    al[0], al[1] = float(strs[0]), float(strs[1])

Changes:

Preallocate the numpy array (this is big). You already know you want a 2-dimensional array with particular dimensions.
Only split() for the first 2 columns, since you don't want the rest.
Don't use map(): it's slower than list comprehensions. I didn't even use list comprehensions, since you know you only have 2 columns.
Assign directly into the preallocated array instead of generating new temp arrays as you iterate.

answered Aug 31 '11 at 16:47

sunetos
2,3901210

4

aa = numpy.array([x.split(' ',2)[0:2] for x in br], dtype='float') – steabert Aug 31 '11 at 17:55

Good to know about enumerate: I wasn't aware of it! Also thanks @steabert to his contribution. The speeds of both solutions seem quite similar to me. – Jir Sep 1 '11 at 8:23

add a comment |

unutbu · Answer 2 · 2011-08-31 16:31:37Z

Changing

map (float, i.split()[:2])

to

map (float, i.split(' ',2)[:2])

might result in a slight speedup. Since you only care about first two space-separated items in each line there is no need to split the entire line. The 2 in i.split(' ',2) tells split to just make a maximum of 2 splits. For example,

In [11]: x='14 0.000000 -- (long term 0.000000)\n' 

In [12]: x.split()
Out[12]: ['14', '0.000000', '--', '(long', 'term', '0.000000)']

In [13]: x.split(' ',2)
Out[13]: ['14', '0.000000', '-- (long term 0.000000)\n']

Thanks for the explanation of the second argument of split! — Jir, Sep 1 '11 at 8:21

Simon · Answer 3 · 2011-08-31 16:32:29Z

You can try to preprocess (with awk for exemple) the list of strings if they come from a file, and use numpy.fromtxt. If you can't do anything about the way you get this list, you have several possibilities:

give up. You will run this function once a day. You don't care about speed, and your actual solution is good enough
write an IO plugin with cython. You have a big potential gain because you will be able to do all the loops in c, and affects directly the values in a big (10^5, 2) numpy ndarray
try another language to fix your problem. If using languages such as c or haskell, you may use ctypes to call the functions compiled in a dll from python

edit

maybe this approach is slightly faster:

def conv(mysrt):
    return map(float, mystr.split()[:2])

br_float = map(conv, br)
x = N.array(br_float)

Liked the 'out-of-the-box' thinking! – Jir Sep 1 '11 at 8:20 — Jir, Sep 1 '11 at 8:20

asked	4 years ago
viewed	3315 times
active	4 years ago

current community

your communities

more stack exchange communities

Converting a list of strings in a numpy array in a faster way

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged python string list numpy or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Converting a list of strings in a numpy array in a faster way

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python string list numpy or ask your own question.

Related

Hot Network Questions