Join the Stack Overflow Community
Stack Overflow is a community of 6.8 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

This is an indirect indexing problem.

It can be solved with a list comprehension.

The question is whether, or, how to solve it within numpy,

When data.shape is (T,N) and c.shape is (T,K)

and each element of c is an int between 0 and N-1 inclusive, that is, each element of c is intended to refer to a column number from data.

The goal is to obtain out where

out.shape = (T,K)

And for each i in 0..(T-1)

the row out[i] = [ data[i, c[i,0]] , ... , data[i, c[i,K-1]] ]

Concrete example:

data = np.array([\
       [ 0,  1,  2],\
       [ 3,  4,  5],\
       [ 6,  7,  8],\
       [ 9, 10, 11],\
       [12, 13, 14]])

c = np.array([
      [0, 2],\
      [1, 2],\
      [0, 0],\       
      [1, 1],\       
      [2, 2]])

out should be out = [[0, 2], [4, 5], [6, 6], [10, 10], [14, 14]]

The first row of out is [0,2] because the columns chosen are given by c's row 0, they are 0 and 2, and data[0] at columns 0 and 2 are 0 and 2.

The second row of out is [4,5] because the columns chosen are given by c's row 1, they are 1 and 2, and data[1] at columns 1 and 2 is 4 and 5.

Numpy fancy indexing doesn't seem to solve this in an obvious way because indexing data with c (e.g. data[c], np.take(data,c,axis=1) ) always produces a 3 dimensional array.

A list comprehension can solve it:

out = [ [data[rowidx,i1],data[rowidx,i2]] for (rowidx, (i1,i2)) in enumerate(c) ]

if K is 2 I suppose this is marginally OK. If K is variable, this is not so good.

The list comprehension has to be rewritten for each value K, because it unrolls the columns picked out of data by each row of c. It also violates DRY.

Is there a solution based entirely in numpy?

share|improve this question
up vote 2 down vote accepted

You can avoid loops with np.choose:

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.

data = np.array([\
       [ 0,  1,  2],\
       [ 3,  4,  5],\
       [ 6,  7,  8],\
       [ 9, 10, 11],\
       [12, 13, 14]])

c = np.array([
      [0, 2],\
      [1, 2],\
      [0, 0],\
      [1, 1],\
      [2, 2]])
--

In [2]: np.choose(c, data.T[:,:,np.newaxis])
Out[2]: 
array([[ 0,  2],
       [ 4,  5],
       [ 6,  6],
       [10, 10],
       [14, 14]])
share|improve this answer
    
Nice! I didn't think to use choose. – ajcr Oct 6 '14 at 22:04
    
Yup, it takes a while to wrap your head around its possible uses. – immerrr Oct 6 '14 at 22:24
    
Thanks. This was what I was looking for. – Paul Oct 7 '14 at 0:37

Here's one possible route to a general solution...

Create masks for data to select the values for each column of out. For example, the first mask could be achieved by writing:

>>> np.arange(3) == np.vstack(c[:,0])
array([[ True, False, False],
       [False,  True, False],
       [ True, False, False],
       [False,  True, False],
       [False, False,  True]], dtype=bool)

>>> data[_]
array([ 2,  5,  6, 10, 14])

The mask to get the values for the second column of out: np.arange(3) == np.vstack(c[:,1]).

So, to get the out array...

>>> mask0 = np.arange(3) == np.vstack(c[:,0])
>>> mask1 = np.arange(3) == np.vstack(c[:,1])
>>> np.vstack((data[mask0], data[mask1])).T
array([[ 0,  2],
       [ 4,  5],
       [ 6,  6],
       [10, 10],
       [14, 14]])

Edit: Given arbitrary array widths K and N you could use a loop to create the masks, so the general construction of the out array might simply look like this:

np.vstack([data[np.arange(N) == np.vstack(c[:,i])] for i in range(K)]).T

Edit 2: A slightly neater solution (though still relying on a loop) is:

np.vstack([data[i][c[i]] for i in range(T)])
share|improve this answer
    
This is interesting, and I'll have to look up vstack and see what it does.... But it also unfortunately seems to depend on K. K might not always be 2. – Paul Oct 6 '14 at 19:31
1  
I see... I've edited my answer to adapt to the more general case where K might be large. I'll see if I can think of any other way to avoid loops completely... – ajcr Oct 6 '14 at 19:45

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.