Find indices of numpy array based on values in another numpy array

Question

I want to find the indices in a larger array if they match the values of a different, smaller array. Something like new_array below:

import numpy as np
summed_rows = np.random.randint(low=1, high=14, size=9999)
common_sums = np.array([7,10,13])
new_array = np.where(summed_rows == common_sums)

However, this returns:

__main__:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future. 
>>>new_array 
(array([], dtype=int64),)

The closest I've gotten is:

new_array = [np.array(np.where(summed_rows==important_sum)) for important_sum in common_sums[0]]

This gives me a list with three numpy arrays (one for each 'important sum'), but each is a different length which produces further downstream problems with concatenation and vstacking. To be clear, I do not want to use the line above. I want to use numpy to index into summed_rows. I've looked at various answers using numpy.where, numpy.argwhere, and numpy.intersect1d, but am having trouble putting the ideas together. I figured I'm missing something simple and it would be faster to ask.

Thanks in advance for your recommendations!

You can do that in one line if I understood properly: (summed_rows==common_sums[:,None]).any(0).nonzero()[0] — Brenlla, yesterday

iblasi · Accepted Answer · 2019-09-23 19:29:06Z

Taking into account the proposed options on the comments, and adding an extra option with numpy's in1d option:

>>> import numpy as np
>>> summed_rows = np.random.randint(low=1, high=14, size=9999)
>>> common_sums = np.array([7,10,13])
>>> ind_1 = (summed_rows==common_sums[:,None]).any(0).nonzero()[0]   # Option of @Brenlla
>>> ind_2 = np.where(summed_rows == common_sums[:, None])[1]   # Option of @Ravi Sharma
>>> ind_3 = np.arange(summed_rows.shape[0])[np.in1d(summed_rows, common_sums)]
>>> ind_4 = np.where(np.in1d(summed_rows, common_sums))[0]
>>> ind_5 = np.where(np.isin(summed_rows, common_sums))[0]   # Option of @jdehesa

>>> np.array_equal(np.sort(ind_1), np.sort(ind_2))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_3))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_4))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_5))
True

If you time it, you can see that all of them are quite similar, but @Brenlla's option is the fastest one

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_1 = (a==b[:,None]).any(0).nonzero()[0]'
10000 loops, best of 3: 52.7 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_2 = np.where(a == b[:, None])[1]'
10000 loops, best of 3: 191 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_3 = np.arange(a.shape[0])[np.in1d(a, b)]'
10000 loops, best of 3: 103 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_4 = np.where(np.in1d(a, b))[0]'
10000 loops, best of 3: 63 usec per loo

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_5 = np.where(np.isin(a, b))[0]'
10000 loops, best of 3: 67.1 usec per loop

You really shouldn't be timing the setup commands. Aside from that, nice work. — Mad Physicist, yesterday
Totally agree. I updated the timeit. I'm surprised about the bad performance of arange against where — iblasi, yesterday
I have only tested ind_1, and it accomplishes my task! Others will have to try the other methods. Could either @iblasi or @Brenlla clarify the actual operations behind ind_1? Correct me if I'm wrong: (summed_rows==common_sums[:,None]) returns True/False array; any() method replaces with Boolean 1/0 array; nonzero() method returns the indices of all 1's in Boolean array? — T Walker, 4 hours ago
(summed_rows==common_sums[:,None]) makes a (3,9999) array of bools, where first row tells if if the sample of summed_rows is a 7, 2nd row if it is a 10 and 3rd if it is a 13. any is telling in axis=0 (if inside the rows of each column) there is any True value. And nonzero returns the indices of the elements that are non-zero. So if you want to compare with more common_sums, the array created will be (n,9999) to be able to check the following operations. — iblasi, 2 hours ago

jdehesa · Accepted Answer · 2019-09-23 18:07:22Z

0

Use np.isin:

import numpy as np
summed_rows = np.random.randint(low=1, high=14, size=9999)
common_sums = np.array([7, 10, 13])
new_array = np.where(np.isin(summed_rows, common_sums))

answered yesterday

jdehesa

35.9k4 gold badges43 silver badges66 bronze badges

add a comment |

current community

your communities

more stack exchange communities

Find indices of numpy array based on values in another numpy array

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python arrays numpy or ask your own question.

Hot Network Questions

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python arrays numpy or ask your own question.

Related