Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I have a NumPy array of about 2500 data points. This function is called on a rolling basis where 363 data points is passed at a time.

def fcn(data):
    a = [data[i]/np.mean(data[i-2:i+1])-1 for i in range(len(data)-1, len(data)-362, -1)]
    return a

This takes about 5 seconds to run. I think the bottleneck is the list slicing. Any thoughts on how to speed this up?

share|improve this question

2 Answers 2

up vote 3 down vote accepted

range returns a list. You could use xrange instead.

range([start,] stop[, step]) -> list of integers

Return a list containing an arithmetic progression of integers.

vs

xrange([start,] stop[, step]) -> xrange object

Like range(), but instead of returning a list, returns an object that generates the numbers in the range on demand. For looping, this is slightly faster than range() and more memory efficient.

The other thing that strikes me is the slice in the argument to np.mean. The slice is always of length 3. Assuming this is an arithmetic mean, you could turn the division into

(3.0 * data[i] / (data[i - 2] + data[i - 1] + data[i]))

So putting it together

def fcn(data):
    return [(3.0 * data[i] / (data[i - 2] + data[i - 1] + data[i])) - 1
            for i in xrange(len(data) - 1, len(data) - 362, -1)]

and you could further optimize the sum of last three values by recognizing that if

x = a[n] + a[n+1] + a[n+2]

and you have already computed

y = a[n - 1] + a[n] + a[n + 1]

then

x = y + (a[n - 1] - a[n + 2])

which helps whenever a local variable access and assignment is faster than accessing an element in a series.

share|improve this answer
    
This is great feedback. Your version of fcn runs in half the time as using np.mean! –  strimp099 Aug 3 '11 at 0:46

When using numpy, you should avoid writing loops. Instead you should do operations on the array.

Slicing in numpy is really cheap, because it doesn't actually copy anything.

The tricky part in eliminating the loop is the rolling np.mean(), but see this web page for code to help eliminate that: http://www.rigtorp.se/2011/01/01/rolling-statistics-numpy.html

share|improve this answer
    
Thanks for the feedback. I'm going to check out np.strides. –  strimp099 Aug 3 '11 at 0:47

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.