Python/Numpy - Get Index into Main Array from Subset

Question

Say I have a 100 element numpy array. I perform some calculation on a subset of this array - maybe 20 elements where some condition is met. Then I pick an index in this subset, how can I (efficiently) recover the index in the first array? I don't want to perform the calculation on all values in a because it is expensive, so I only want to perform it where it is required (where that condition is met).

Here is some pseudocode to demonstrate what I mean (the 'condition' here is the list comprehension):

a = np.arange(100)                                 # size = 100
b = some_function(a[[i for i in range(0,100,5)]])  # size = 20
Index = np.argmax(b)

# Index gives the index of the maximum value in b,
# but what I really want is the index of the element
# in a

EDIT:

I wasn't being very clear, so I've provided a more full example. I hope this makes it more clear about what my goal is. I feel like there is some clever and efficient way to do this, without some loops or lookups.

CODE:

import numpy as np

def some_function(arr):
   return arr*2.0

a = np.arange(100)*2.                              # size = 100
b = some_function(a[[i for i in range(0,100,5)]])  # size = 20
Index = np.argmax(b)

print Index
# Index gives the index of the maximum value in b, but what I really want is
# the index of the element in a

# In this specific case, Index will be 19.  So b[19] is the largest value
# in b.  Now, what I REALLY want is the index in a.  In this case, that would
# 95 because some_function(a[95]) is what made the largest value in b.
print b[Index]
print some_function(a[95])

# It is important to note that I do NOT want to change a.  I will perform
# several calculations on SOME values of a, then return the indices of 'a' where
# all calculations meet some condition.

Not sure if I understand your question, but do you want to find out the 20 indices calculated in the second step? How does the final Index relate to it? — ejel, Apr 23 '11 at 2:23
@ejel: To try to explain it, let just say that some_function ignores the input array and just returns an array of random intergers the same length as the input. Then Index would contain the index into b that has the largest (random) number. The index in b really corresponds to some index in a, and that index in a is what I I want. — Scott B, Apr 23 '11 at 3:47

Alok Singhal · Accepted Answer · 2011-04-23 03:41:38Z

up vote 1 down vote accepted

I am not sure if I understand your question. So, correct me if I am wrong.

Let's say you have something like

a = np.arange(100)
condition = (a % 5 == 0) & (a % 7 == 0)
b = a[condition]
index = np.argmax(b)
# The following should do what you want
a[condition][index]

Or if you don't want to work with masks:

a = np.arange(100)
b_indices = np.where(a % 5 == 0)
b = a[b_indices]
index = np.argmax(b)
# Get the value of 'a' corresponding to 'index'
a[b_indices][index]

Is this what you want?

answered Apr 23 '11 at 3:41

Alok Singhal
33k66498

I updated my question to make it more clear. In your code, a[condition][index] returns the value in a, but I want the INDEX in a, so that a[INDEX] = a[condition][index]. Is there an easy way to get INDEX from condition and index? I imagine there is, but it isn't obvious to me. – Scott B Apr 23 '11 at 4:02

2

np.arange(len(a))[condition][index] perhaps? – Alok Singhal Apr 23 '11 at 4:31

That works, thanks. – Scott B Apr 26 '11 at 4:02

add a comment |

thouis · Answer 2 · 2011-04-23 05:54:30Z

Use a secondary array, a_index, which is just the indices of the elements of a, so a_index[3,5] = (3,5). Then you can get the original index as a_index[condition == True][Index].

If you can guarantee that b is a view on a, you can use the memory layout information of the two arrays to find a translation between b's and a's indices.

Paul · Answer 3 · 2011-04-23 04:30:10Z

Normally you'd store the index based on the condition before you made any changes to the array. You use the index to make the changes.

If a is your array:

>>> a = np.random.random((10,5))
>>> a
array([[ 0.22481885,  0.80522855,  0.1081426 ,  0.42528799,  0.64471832],
       [ 0.28044374,  0.16202575,  0.4023426 ,  0.25480368,  0.87047212],
       [ 0.84764143,  0.30580141,  0.16324907,  0.20751965,  0.15903343],
       [ 0.55861168,  0.64368466,  0.67676172,  0.67871825,  0.01849056],
       [ 0.90980614,  0.95897292,  0.15649259,  0.39134528,  0.96317126],
       [ 0.20172827,  0.9815932 ,  0.85661944,  0.23273944,  0.86819205],
       [ 0.98363954,  0.00219531,  0.91348196,  0.38197302,  0.16002007],
       [ 0.48069675,  0.46057327,  0.67085243,  0.05212357,  0.44870942],
       [ 0.7031601 ,  0.50889065,  0.30199446,  0.8022497 ,  0.82347358],
       [ 0.57058441,  0.38748261,  0.76947605,  0.48145936,  0.26650583]])

And b is your subarray:

>>> b = a[2:4,2:7]
>>> b
array([[ 0.16324907,  0.20751965,  0.15903343],
       [ 0.67676172,  0.67871825,  0.01849056]])

It can be shown that a still owns the data in b:

>>> b.base
array([[ 0.22481885,  0.80522855,  0.1081426 ,  0.42528799,  0.64471832],
       [ 0.28044374,  0.16202575,  0.4023426 ,  0.25480368,  0.87047212],
       [ 0.84764143,  0.30580141,  0.16324907,  0.20751965,  0.15903343],
       [ 0.55861168,  0.64368466,  0.67676172,  0.67871825,  0.01849056],
       [ 0.90980614,  0.95897292,  0.15649259,  0.39134528,  0.96317126],
       [ 0.20172827,  0.9815932 ,  0.85661944,  0.23273944,  0.86819205],
       [ 0.98363954,  0.00219531,  0.91348196,  0.38197302,  0.16002007],
       [ 0.48069675,  0.46057327,  0.67085243,  0.05212357,  0.44870942],
       [ 0.7031601 ,  0.50889065,  0.30199446,  0.8022497 ,  0.82347358],
       [ 0.57058441,  0.38748261,  0.76947605,  0.48145936,  0.26650583]])

You can make changes to both a and b in two ways:

>>> b+=1
>>> b
array([[ 1.16324907,  1.20751965,  1.15903343],
       [ 1.67676172,  1.67871825,  1.01849056]])
>>> a
array([[ 0.22481885,  0.80522855,  0.1081426 ,  0.42528799,  0.64471832],
       [ 0.28044374,  0.16202575,  0.4023426 ,  0.25480368,  0.87047212],
       [ 0.84764143,  0.30580141,  1.16324907,  1.20751965,  1.15903343],
       [ 0.55861168,  0.64368466,  1.67676172,  1.67871825,  1.01849056],
       [ 0.90980614,  0.95897292,  0.15649259,  0.39134528,  0.96317126],
       [ 0.20172827,  0.9815932 ,  0.85661944,  0.23273944,  0.86819205],
       [ 0.98363954,  0.00219531,  0.91348196,  0.38197302,  0.16002007],
       [ 0.48069675,  0.46057327,  0.67085243,  0.05212357,  0.44870942],
       [ 0.7031601 ,  0.50889065,  0.30199446,  0.8022497 ,  0.82347358],
       [ 0.57058441,  0.38748261,  0.76947605,  0.48145936,  0.26650583]])

Or:

>>> a[2:4,2:7]+=1
>>> a
array([[ 0.22481885,  0.80522855,  0.1081426 ,  0.42528799,  0.64471832],
       [ 0.28044374,  0.16202575,  0.4023426 ,  0.25480368,  0.87047212],
       [ 0.84764143,  0.30580141,  1.16324907,  1.20751965,  1.15903343],
       [ 0.55861168,  0.64368466,  1.67676172,  1.67871825,  1.01849056],
       [ 0.90980614,  0.95897292,  0.15649259,  0.39134528,  0.96317126],
       [ 0.20172827,  0.9815932 ,  0.85661944,  0.23273944,  0.86819205],
       [ 0.98363954,  0.00219531,  0.91348196,  0.38197302,  0.16002007],
       [ 0.48069675,  0.46057327,  0.67085243,  0.05212357,  0.44870942],
       [ 0.7031601 ,  0.50889065,  0.30199446,  0.8022497 ,  0.82347358],
       [ 0.57058441,  0.38748261,  0.76947605,  0.48145936,  0.26650583]])
>>> b
array([[ 1.16324907,  1.20751965,  1.15903343],
       [ 1.67676172,  1.67871825,  1.01849056]])

Both are equivalent and neither is more expensive than the other. Therefore as long as you retain the indices that created b from a, you can always view the changed data in the base array. Often it is not even necessary to create a subarray when doing operations on slices.

Edit

This assumes some_func returns the indices in the subarray where some condition is true.

I think when a function returns indices and you only want to feed that function a subarray, you still need to store the indices of that subarray and use them to get the base array indices. For example:

>>> def some_func(a):
...     return np.where(a>.8)
>>> a = np.random.random((10,4))
>>> a
array([[ 0.94495378,  0.55532342,  0.70112911,  0.4385163 ],
       [ 0.12006191,  0.93091941,  0.85617421,  0.50429453],
       [ 0.46246102,  0.89810859,  0.31841396,  0.56627419],
       [ 0.79524739,  0.20768512,  0.39718061,  0.51593312],
       [ 0.08526902,  0.56109783,  0.00560285,  0.18993636],
       [ 0.77943988,  0.96168229,  0.10491335,  0.39681643],
       [ 0.15817781,  0.17227806,  0.17493879,  0.93961027],
       [ 0.05003535,  0.61873245,  0.55165992,  0.85543841],
       [ 0.93542227,  0.68104872,  0.84750821,  0.34979704],
       [ 0.06888627,  0.97947905,  0.08523711,  0.06184216]])
>>> i_off, j_off = 3,2
>>> b = a[i_off:,j_off:]  #b
>>> i = some_func(b) #indicies in b
>>> i
(array([3, 4, 5]), array([1, 1, 0]))
>>> map(sum, zip(i,(i_off, j_off))) # indicies in a
[array([6, 7, 8]), array([3, 3, 2])]

Edit 2

This assumes some_func returns a modified copy of the subarray b.

Your example would look something like this:

import numpy as np

def some_function(arr):
   return arr*2.0

a = np.arange(100)*2.                              # size = 100
idx = np.array(range(0,100,5))
b = some_function(a[idx])  # size = 20
b_idx = np.argmax(b)
a_idx = idx[b_idx]  # indices in a translated from indices in b

print b_idx, a_idx
print b[b_idx], a[a_idx]

assert b[b_idx] == 2* a[a_idx]  #true!

Yes I do understand how I can use the indices like that. But for my particular application, that isn't what I need. Probably my example wasn't the best. This code goes into a function and the function needs to return the indices into the array where certain conditions are met. So the function might be something like def some_function(arr) and it returns the indices in arr that meet a series of conditions. I do not intent to ever change the values of the array. — Scott B, Apr 23 '11 at 3:43
See my edit. I don't see any way to get the indices that locate a subarray in its base array. That would be nice. I think you just need to store the (base array) indices that you used to create the subarray and then apply them as an offset to the returned (subarray) indices. — Paul, Apr 23 '11 at 4:06
If idx is instead a boolean array (by doing something like idx = a%5==0), then this doesn't work, though with a slight modification it could. It then ends up being very similar to what Alok said. As always, thanks for the suggestions Paul. — Scott B, Apr 26 '11 at 4:04

asked	3 years ago
viewed	2960 times
active	3 years ago

current community

your communities

more stack exchange communities

Python/Numpy - Get Index into Main Array from Subset

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged python indexing numpy or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Python/Numpy - Get Index into Main Array from Subset

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python indexing numpy or ask your own question.

Linked

Related

Hot Network Questions