Question

3

1

I have a numpy array [1,2,3,4,5,6,7,8,9,10,11,12,13,14] and want to have an array structured like [[1,2,3,4], [2,3,4,5], [3,4,5,6], ..., [11,12,13,14]].

Sure this is possible by looping over the large array and adding arrays of length four to the new array, but I'm curious if there is some secret 'magic' Python method doing just this :)

flag	asked Mar 21 at 2:38 Snej 79●5 69% accept rate

gnibbler · Answer 1 · 2010-03-21 09:16:38Z

The fastest way seems to be to preallocate the array, given as option 7 right at the bottom of this answer

>>> import numpy as np
>>> A=np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14])
>>> A
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
>>> np.array(zip(A,A[1:],A[2:],A[3:]))
array([[ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14]])
>>>

You can easily adapt this to do it for variable chunk size

>>> n=5
>>> np.array(zip(*(A[i:] for i in range(n))))
array([[ 1,  2,  3,  4,  5],
       [ 2,  3,  4,  5,  6],
       [ 3,  4,  5,  6,  7],
       [ 4,  5,  6,  7,  8],
       [ 5,  6,  7,  8,  9],
       [ 6,  7,  8,  9, 10],
       [ 7,  8,  9, 10, 11],
       [ 8,  9, 10, 11, 12],
       [ 9, 10, 11, 12, 13],
       [10, 11, 12, 13, 14]])

You may wish to compare performance between this and using itertools.islice

>>> from itertools import islice
>>> n=4
>>> np.array(zip(*[islice(A,i,None) for i in range(n)]))
array([[ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14]])

My timing results:

1. timeit np.array(zip(A,A[1:],A[2:],A[3:]))
10000 loops, best of 3: 92.9 us per loop

2. timeit np.array(zip(*(A[i:] for i in range(4))))
10000 loops, best of 3: 101 us per loop

3. timeit np.array(zip(*[islice(A,i,None) for i in range(4)]))
10000 loops, best of 3: 101 us per loop

4. timeit numpy.array([ A[i:i+4] for i in range(len(A)-3) ])
10000 loops, best of 3: 37.8 us per loop

5. timeit numpy.array(list(chunks(A, 4)))
10000 loops, best of 3: 43.2 us per loop

6. timeit numpy.array(byN(A, 4))
10000 loops, best of 3: 100 us per loop

# Does preallocation of the array help? (11 is from len(A)+1-4)
7. timeit B=np.zeros(shape=(11, 4),dtype=np.int32)
100000 loops, best of 3: 2.19 us per loop
   timeit for i in range(4):B[:,i]=A[i:11+i]
10000 loops, best of 3: 20.9 us per loop
total 23.1us per loop

As len(A) increases (20000) 4 and 5 converge to be equivalent speed (44ms). 1,2,3 and 6 all remain about 3 times slower(135ms). 7 is much faster (1.36ms)

Federico Ramponi · Answer 2 · 2010-03-21 02:56:43Z

Quick&dirty solution:

>>> a = numpy.arange(1,15)
>>> numpy.array([ a[i:i+4] for i in range(len(a)-3) ])
array([[ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14]])

Alex Martelli · Answer 3 · 2010-03-21 03:27:57Z

1

Using itertools, and assuming Python 2.6:

import itertools

def byN(iterable, N):
    itrs = itertools.tee(iter(iterable), N)
    for n in range(N):
        for i in range(n):
            next(itrs[n], None)
    return zip(*itrs)

aby4 = numpy.array(byN(thearray, 4))

edited Mar 21 at 3:27

answered Mar 21 at 3:04

Alex Martelli
89.2k●3●81●213

	This code doesn't work. numpy would interpret the `izip` object as a scalar and make a rank-1, length-1 array of dtype `object` out of it. – Mike Graham Mar 21 at 3:06
	Also, you need to `from itertools import tee, izip` or use the fully-qualified names. – Mike Graham Mar 21 at 3:08
	I thought numpy.array accepted any iterable -- editing to fix. – Alex Martelli Mar 21 at 3:09
	Also, you have backwards logic in traversing the iterators you teed. The columns in the resulting array would be backwards if the code did work. – Mike Graham Mar 21 at 3:10
	@Alex Martelli, `numpy.array` doesn't. It can be frustrating, but I think the reason is that it needs to preallocate the array so it wants to know the size of the input ahead of time. – Mike Graham Mar 21 at 3:10

show 2 more comments

bpowah · Answer 4 · 2010-03-21 23:17:08Z

You should use stride_tricks. When I first saw this, the word 'magic' did spring to mind. It's simple and is by far the fastest method.

>>> as_strided = numpy.lib.stride_tricks.as_strided
>>> a = numpy.arange(1,15)
>>> a
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
>>> b = as_strided(a,(11,4),(4,4))
>>> b
array([[ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14]])

Be aware that the values in array b are those in a, just viewed differently. Do a .copy() on b if you plan to modify it.

I saw this at a SciPy conference. Here are the slides for more explanation.

Mike Graham · Answer 5 · 2010-03-21 02:50:41Z

I know of no Python stdlib function that quite does that. It's easy enough to do. Here is a generator that basically does it:

def chunks(sequence, length):
    for index in xrange(0, len(sequence) - length + 1):
        yield sequence[index:index + length]

You could use it like so

>>> import numpy
>>> a = numpy.arange(1, 15)
>>> a
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
>>> numpy.array(list(chunks(a, 4)))
array([[ 1,  2,  3,  4],
       [ 2,  3,  4,  5],
       [ 3,  4,  5,  6],
       [ 4,  5,  6,  7],
       [ 5,  6,  7,  8],
       [ 6,  7,  8,  9],
       [ 7,  8,  9, 10],
       [ 8,  9, 10, 11],
       [ 9, 10, 11, 12],
       [10, 11, 12, 13],
       [11, 12, 13, 14]])

The only weird thing about this code is that I called list on the result of chunks(a, 4). This is since numpy.array does not accept an arbitrary iterable, such as the generator chunks returns. If you just want to iterate over these chunks, you don't need to bother. If you really need to put the result into an array you can do it this way or some more efficient ways.

Autoplectic · Answer 6 · 2010-03-21 05:08:59Z

the efficient numpy way to do this is given here, which is somewhat too long to reproduce here. it boils down to using some stride tricks, and is much faster than itertools for largish window sizes. for example, using a method essentially the same as that of of alex martelli's:

In [16]: def windowed(sequence, length):
    seqs = tee(sequence, length)
    [ seq.next() for i, seq in enumerate(seqs) for j in xrange(i) ]
    return zip(*seqs)

we get:

In [19]: data = numpy.random.randint(0, 2, 1000000)

In [20]: %timeit windowed(data, 2)
100000 loops, best of 3: 6.62 us per loop
In [21]: %timeit windowed(data, 10)
10000 loops, best of 3: 29.3 us per loop
In [22]: %timeit windowed(data, 100)
1000 loops, best of 3: 1.41 ms per loop
In [23]: %timeit segment_axis(data, 2, 1)
10000 loops, best of 3: 30.1 us per loop
In [24]: %timeit segment_axis(data, 10, 9)
10000 loops, best of 3: 30.2 us per loop
In [25]: %timeit segment_axis(data, 100, 99)
10000 loops, best of 3: 30.5 us per loop

meduz · Answer 7 · 2010-03-24 14:06:48Z

Broadcast!

from numpy import ogrid
def stretch(N=5,M=15):
    x, y = ogrid[0:M,0:N]
    return x+y+1

Note that ogrid does give things like:

>> ogrid[0:5,0:5]
>> 
[array([[0],
       [1],
       [2],
       [3],
       [4]]),
 array([[0, 1, 2, 3, 4]])]

Stretch array (Numpy, Python)

7 Answers

My timing results:

Your Answer

Not the answer you're looking for? Browse other questions tagged python numpy scipy or ask your own question.

Hello World!

Related