Fast way to find index of array in array of arrays

Question

Suppose I have a numpy array of arrays of length 4:

In [41]: arr
Out[41]:
array([[  1,  15,   0,   0],
       [ 30,  10,   0,   0],
       [ 30,  20,   0,   0],
       ...,
       [104, 139, 146,  75],
       [  9,  11, 146,  74],
       [  9, 138, 146,  75]], dtype=uint8)

I want to know:

Is it true that arr includes [1, 2, 3, 4]?
If it true what index of [1, 2, 3, 4] in arr?

I want to find out it as fast as it possible.

Suppose arr contains 8550420 elements. I've checked several methods with timeit:

Just for checking without getting index: any(all([1, 2, 3, 4] == elt) for elt in arr). It tooks 15.5 sec in average on 10 runs on my machine
for-based solution:

for i,e in enumerate(arr): if list(e) == [1, 2, 3, 4]: break

It tooks about 5.7 secs in average

Does exists some faster solutions, for example numpy based?

if you're not concerned about extra memory you could create dictionary where keys are tuples created from your lists — Roman Pekar, Jul 23 '13 at 13:12
But will it save some time? It will take time to make dictionary of tuples from my array. — petRUShka, Jul 23 '13 at 13:14
well it depends, if you perform your search more than once, than it definitely will save you some time — Roman Pekar, Jul 23 '13 at 13:15
I think this answer might help - stackoverflow.com/a/17797247/2452770. Using where in conjunction with all is probably what gives you the fastest find. — A.Wan, Jul 23 '13 at 13:37

unutbu · Accepted Answer · 2013-07-23 14:03:49Z

up vote 6 down vote accepted

This is Jaime's idea, I just love it:

import numpy as np

def asvoid(arr):
    """View the array as dtype np.void (bytes)
    This collapses ND-arrays to 1D-arrays, so you can perform 1D operations on them.
    http://stackoverflow.com/a/16216866/190597 (Jaime)"""    
    arr = np.ascontiguousarray(arr)
    return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))

def find_index(arr, x):
    arr_as1d = asvoid(arr)
    x = asvoid(x)
    return np.nonzero(arr_as1d == x)[0]


arr = np.array([[  1,  15,   0,   0],
                [ 30,  10,   0,   0],
                [ 30,  20,   0,   0],
                [1, 2, 3, 4],
                [104, 139, 146,  75],
                [  9,  11, 146,  74],
                [  9, 138, 146,  75]], dtype='uint8')

arr = np.tile(arr,(1221488,1))
x = np.array([1,2,3,4], dtype='uint8')

print(find_index(arr, x))

yields

[      3      10      17 ..., 8550398 8550405 8550412]

The idea is to view each row of the array as a string. For example,

In [15]: x
Out[15]: 
array([^A^B^C^D], 
      dtype='|V4')

The strings look like garbage, but they are really just the underlying data in each row viewed as bytes. You can then compare arr_as1d == x to find which rows equal x.

There is another way to do it:

def find_index2(arr, x):
    return np.where((arr == x).all(axis=1))[0]

but it turns out to be not as fast:

In [34]: %timeit find_index(arr, x)
1 loops, best of 3: 209 ms per loop

In [35]: %timeit find_index2(arr, x)
1 loops, best of 3: 370 ms per loop

edited Jul 23 '13 at 14:03

answered Jul 23 '13 at 13:36

unutbu
224k15318476

1

I don't love it. Views can be a very nice trick, but you have to be careful about contiguiuty and if you use this on a float array you get -0. != 0. (does not matter for these uint8, but...). Is this even any faster then just using arr.all for each row? The view trick makes more sense to me if you want to use sorting methods because you need to find the index often. – seberg Jul 23 '13 at 13:51

Well, loving is perhaps too strong a word, but it seems to fill a need for which there is sometimes no other option. And in this case, it is faster than np.all(..., axis=1). – unutbu Jul 23 '13 at 13:55

Sorry, there is a ascontiguous array in there anyway, so that part is fine :) – seberg Jul 23 '13 at 13:57

No, you were correct; I just added it. And thanks for the warning regarding -0. != 0. – unutbu Jul 23 '13 at 13:58

Yes, find_index is very cool! – petRUShka Jul 23 '13 at 15:12

add a comment |

Roman Pekar · Answer 2 · 2013-07-23 13:19:40Z

up vote 0 down vote

If you perform search more than one time and you don't mind to use extra memory, you can create set from you array (I'm using list here, but it's almost the same code):

>>> elem = [1, 2, 3, 4]    
>>> elements = [[  1,  15,   0,   0], [ 30,  10,   0,   0], [1, 2, 3, 4]]
>>> index = set([tuple(x) for x in elements])
>>> True if tuple(elem) in index else False
True

answered Jul 23 '13 at 13:19

Roman Pekar
32k41844

Don't even need the conditional statement. tuple(... index should be fine on itself. ( I can't check though; on my phone) – Haidro Jul 23 '13 at 13:26

add a comment |

asked	1 year ago
viewed	104 times
active	1 year ago

current community

your communities

more stack exchange communities

Fast way to find index of array in array of arrays

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged python arrays search numpy multidimensional-array or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Fast way to find index of array in array of arrays

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python arrays search numpy multidimensional-array or ask your own question.

Linked

Related

Hot Network Questions