0

I want to find the indices in a larger array if they match the values of a different, smaller array. Something like new_array below:

import numpy as np
summed_rows = np.random.randint(low=1, high=14, size=9999)
common_sums = np.array([7,10,13])
new_array = np.where(summed_rows == common_sums)

However, this returns:

__main__:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future. 
>>>new_array 
(array([], dtype=int64),)

The closest I've gotten is:

new_array = [np.array(np.where(summed_rows==important_sum)) for important_sum in common_sums[0]]

This gives me a list with three numpy arrays (one for each 'important sum'), but each is a different length which produces further downstream problems with concatenation and vstacking. To be clear, I do not want to use the line above. I want to use numpy to index into summed_rows. I've looked at various answers using numpy.where, numpy.argwhere, and numpy.intersect1d, but am having trouble putting the ideas together. I figured I'm missing something simple and it would be faster to ask.

Thanks in advance for your recommendations!

  • You can do that in one line if I understood properly: (summed_rows==common_sums[:,None]).any(0).nonzero()[0] – Brenlla yesterday
  • np.where(summed_rows == common_sums[:, None])[1] – Ravi Sharma yesterday
2

Taking into account the proposed options on the comments, and adding an extra option with numpy's in1d option:

>>> import numpy as np
>>> summed_rows = np.random.randint(low=1, high=14, size=9999)
>>> common_sums = np.array([7,10,13])
>>> ind_1 = (summed_rows==common_sums[:,None]).any(0).nonzero()[0]   # Option of @Brenlla
>>> ind_2 = np.where(summed_rows == common_sums[:, None])[1]   # Option of @Ravi Sharma
>>> ind_3 = np.arange(summed_rows.shape[0])[np.in1d(summed_rows, common_sums)]
>>> ind_4 = np.where(np.in1d(summed_rows, common_sums))[0]
>>> ind_5 = np.where(np.isin(summed_rows, common_sums))[0]   # Option of @jdehesa

>>> np.array_equal(np.sort(ind_1), np.sort(ind_2))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_3))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_4))
True
>>> np.array_equal(np.sort(ind_1), np.sort(ind_5))
True

If you time it, you can see that all of them are quite similar, but @Brenlla's option is the fastest one

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_1 = (a==b[:,None]).any(0).nonzero()[0]'
10000 loops, best of 3: 52.7 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_2 = np.where(a == b[:, None])[1]'
10000 loops, best of 3: 191 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_3 = np.arange(a.shape[0])[np.in1d(a, b)]'
10000 loops, best of 3: 103 usec per loop

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_4 = np.where(np.in1d(a, b))[0]'
10000 loops, best of 3: 63 usec per loo

python -m timeit -s 'import numpy as np; np.random.seed(0); a = np.random.randint(low=1, high=14, size=9999); b = np.array([7,10,13])' 'ind_5 = np.where(np.isin(a, b))[0]'
10000 loops, best of 3: 67.1 usec per loop
  • 1
    You really shouldn't be timing the setup commands. Aside from that, nice work. – Mad Physicist yesterday
  • Totally agree. I updated the timeit. I'm surprised about the bad performance of arange against where – iblasi yesterday
  • Bad is very relative here. – Mad Physicist yesterday
  • I have only tested ind_1, and it accomplishes my task! Others will have to try the other methods. Could either @iblasi or @Brenlla clarify the actual operations behind ind_1? Correct me if I'm wrong: (summed_rows==common_sums[:,None]) returns True/False array; any() method replaces with Boolean 1/0 array; nonzero() method returns the indices of all 1's in Boolean array? – T Walker 4 hours ago
  • (summed_rows==common_sums[:,None]) makes a (3,9999) array of bools, where first row tells if if the sample of summed_rows is a 7, 2nd row if it is a 10 and 3rd if it is a 13. any is telling in axis=0 (if inside the rows of each column) there is any True value. And nonzero returns the indices of the elements that are non-zero. So if you want to compare with more common_sums, the array created will be (n,9999) to be able to check the following operations. – iblasi 2 hours ago
0

Use np.isin:

import numpy as np
summed_rows = np.random.randint(low=1, high=14, size=9999)
common_sums = np.array([7, 10, 13])
new_array = np.where(np.isin(summed_rows, common_sums))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.