Join the Stack Overflow Community
Stack Overflow is a community of 6.7 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

I have a numpy array that has dimensions (x, y, z) = (5, 50, 4). For every (x, y) pair, I want to find the index of the maximum value along the z axis. This index is in range(4). I want to select all of these "maximum" elements and set them to 1. Then, I want to select all other elements and set them to zero.

To explain it another way, I want to look at all vectors in the "z" direction (there are x*y of these vectors total). I want to set the maximum element to 1 and all other elements to 0. For example, the vector (0.25, 0.1, 0.5, 0.15) will become (0, 0, 1, 0).

I've tried many different things. The argmax function seems like it should help. But how do I use it to select elements correctly? I have tried things like...

x = data
i = x.argmax(axis = 2)
x[i] # shape = (5, 50, 50, 4)
x[:,:,i] # shape = (5, 50, 5, 50)
x[np.unravel_index(i), x.shape] # shape = (5, 50)

The last one, which uses np.unravel_index, has the correct shape, but the selected indices are NOT the maximum values along the z axis. So I'm having some trouble. If anyone could help at all, it would be really awesome. Thanks!


Edit: Here is a way I have found to do this. But if anyone has anything that is faster, please let me know!

def fix_vector(a):
    i = a.argmax()
    a = a*0
    a[i] = 1
    return a

y = np.apply_along_axis(fix_vector, axis=2, arr=x)

I would really like to optimize this if possible, since I call this function MANY times.


Edit: Thanks DSM for a nice solution. Here is a small example dataset, as requested in the comments.

data = np.random.random((3,5,4))
desired_output = np.apply_along_axis(fix_vector, axis=2, arr=data)

This uses the fix_vector function I posted above, but DSM's solution is faster. Thanks again!

share|improve this question
1  
You should add a, small, example of the input array and the desired result after processing. – wwii Aug 13 '15 at 3:40

This isn't particularly elegant, but:

def faster(x):
    d = x.reshape(-1, x.shape[-1])
    d2 = np.zeros_like(d)
    d2[np.arange(len(d2)), d.argmax(1)] = 1
    d2 = d2.reshape(x.shape)
    return d2

seems a little faster than the fix_vector approach. For example:

>>> x,y,z = 5,50,4
>>> data = np.random.random((x,y,z))
>>> np.allclose(orig(data), faster(data))
True
>>> %timeit -n 1000 orig(data)
1000 loops, best of 3: 5.77 ms per loop
>>> %timeit -n 1000 faster(data)
1000 loops, best of 3: 36.6 µs per loop
share|improve this answer

This can be done with numpy.where

import numpy as np
a = np.array([[[ 0.25,  0.10 ,  0.50 ,  0.15],
               [ 0.50,  0.60 ,  0.40 ,  0.30]],
              [[ 0.25,  0.50 ,  0.20 ,  0.70],
               [ 0.80,  0.10 ,  0.50 ,  0.15]]])

Find the max value along the last axis

b = a.max(-1)    #shape is (2,2)

Add an axis to b so that it will broadcast across a and create a boolean array.

condition = a == b[..., np.newaxis]

Use numpy.where for the substitution.

c = np.where(condition, 1, 0)

>>> c
array([[[0, 0, 1, 0],
        [0, 1, 0, 0]],

       [[0, 0, 0, 1],
        [1, 0, 0, 0]]])

def f(a):
    b = a.max(-1)
    condition = a == b[..., np.newaxis]
    return np.where(condition, 1, 0)

So I wasn't satisfied - I've seen cases where reshape is faster than adding a new axis so I played around a bit. Seems numpy.where itself is a tad slow. Using boolean indexing with an assignment gives pretty much the same performance as @DSM's.

def h(a):
    b = a.max(-1)
    condition = a == b[..., np.newaxis]
    a[condition] = 1
    a[np.logical_not(condition)] = 0
    return a
share|improve this answer
    
Comparable but a tad (50%) slower than @DSM's - looks like they both scale the same with the size of the array. – wwii Aug 13 '15 at 4:26
    
Note that conceivably this could result in multiple 1s in a row if the maximum isn't unique. But for all we know, this might be what the OP wants to happen in that case (the question is ambiguous on the point). – DSM Aug 13 '15 at 4:26
    
@DSM - yours chooses the lowest index for multiple maximums in a row? – wwii Aug 13 '15 at 4:47
    
thanks for the solution, wwii. this is exactly what i was looking for. just fyi, for my application, i never encounter two identical values, so both solutions should work. – adn bps Aug 13 '15 at 14:53

Sample x, (2,3,4) in shape:

In [934]: x
Out[934]: 
array([[[ 5,  7,  4, 15],
        [ 9, 23, 14,  9],
        [17, 13,  2,  1]],

       [[ 2, 18,  3, 18],
        [10, 12, 23, 14],
        [17, 23, 19, 13]]])

The max values along the last axis - clear enough, a (2,3) array:

In [935]: x.max(2)
Out[935]: 
array([[15, 23, 17],
       [18, 23, 23]])

Cooresponding indexes along that axis

In [936]: I=x.argmax(2)

In [937]: I
Out[937]: 
array([[3, 1, 0],
       [1, 2, 1]], dtype=int32)

One way of indexing x with those indexes. At this point I'm constructing the 0 and 1 axis indexes manually. It needs to be automated

In [938]: x[np.arange(2)[:,None], np.arange(3)[None,:], I]
Out[938]: 
array([[15, 23, 17],
       [18, 23, 23]])

The same indexes can be used to set values in another array:

In [939]: y=np.zeros_like(x)   
In [940]: y[np.arange(2)[:,None],np.arange(3)[None,:],I]=x.max(2)
In [941]: y
Out[941]: 
array([[[ 0,  0,  0, 15],
        [ 0, 23,  0,  0],
        [17,  0,  0,  0]],

       [[ 0, 18,  0,  0],
        [ 0,  0, 23,  0],
        [ 0, 23,  0,  0]]])

The trick is to come up with a more streamlined way of generating the indexing tuple:

(np.arange(2)[:,None], np.arange(3)[None,:], I)

(array([[0],
    [1]]), 
 array([[0, 1, 2]]), 
 array([[3, 1, 0],
    [1, 2, 1]], dtype=int32))

np.ix_ helps, though concatenating 2 tuples is a little ugly:

J = np.ix_(range(x.shape[0]), range(x.shape[1])) + (I,)

Indexing on a flattened array can be faster:

In [998]: J1 = np.ravel_multi_index(J, x.shape)
In [999]: J1
Out[999]: 
array([[ 3,  5,  8],
     [13, 18, 21]], dtype=int32)

In [1000]: x.flat[J1]
Out[1000]: 
array([[15, 23, 17],
   [18, 23, 23]])

In [1001]: y.flat[J1]=2

In [1002]: y
Out[1002]: 
array([[[0, 0, 0, 2],
        [0, 2, 0, 0],
        [2, 0, 0, 0]],

       [[0, 2, 0, 0],
        [0, 0, 2, 0],
        [0, 2, 0, 0]]])

But I still need to construct the J indexing tuple.

This is similar to DSMs solution, but sticks with the 3 dimensions. Flattening 2 of the dimensions eliminates the need for the xi_ construction.


def max_index(x, n, fn=np.argmax):
    # return multidimensional indexing tuple
    # applying fn along axis n
    I = fn(x, axis=n)
    alist = list(np.ix_(*[range(i) for i in I.shape]))
    alist.insert(n, I)
    return tuple(alist)

is a general purpose function to generate the J like indexing tuple

e.g.

y = np.zeros_like(x)
I = max_index(x, 1, np.argmin)
y[I] = x[I]
I = max_index(x, 1, np.argmax)
y[I] = x[I]

inserts both the minimum and maximum values of x (along axis 1) into y. Speed is about half of DSMs faster.

share|improve this answer
    
thanks for the solution, hpaulj. i think DSM and wwii provided more elegant solutions, but it was interesting to see how this works too. – adn bps Aug 13 '15 at 14:57

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.