Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Suppose I have a numpy array of arrays of length 4:

In [41]: arr
Out[41]:
array([[  1,  15,   0,   0],
       [ 30,  10,   0,   0],
       [ 30,  20,   0,   0],
       ...,
       [104, 139, 146,  75],
       [  9,  11, 146,  74],
       [  9, 138, 146,  75]], dtype=uint8)

I want to know:

  1. Is it true that arr includes [1, 2, 3, 4]?
  2. If it true what index of [1, 2, 3, 4] in arr?

I want to find out it as fast as it possible.

Suppose arr contains 8550420 elements. I've checked several methods with timeit:

  1. Just for checking without getting index: any(all([1, 2, 3, 4] == elt) for elt in arr). It tooks 15.5 sec in average on 10 runs on my machine
  2. for-based solution:

    for i,e in enumerate(arr): if list(e) == [1, 2, 3, 4]: break

    It tooks about 5.7 secs in average

Does exists some faster solutions, for example numpy based?

share|improve this question
    
if you're not concerned about extra memory you could create dictionary where keys are tuples created from your lists –  Roman Pekar Jul 23 '13 at 13:12
    
But will it save some time? It will take time to make dictionary of tuples from my array. –  petRUShka Jul 23 '13 at 13:14
    
well it depends, if you perform your search more than once, than it definitely will save you some time –  Roman Pekar Jul 23 '13 at 13:15
    
only once per run –  petRUShka Jul 23 '13 at 13:24
    
I think this answer might help - stackoverflow.com/a/17797247/2452770. Using where in conjunction with all is probably what gives you the fastest find. –  A.Wan Jul 23 '13 at 13:37

2 Answers 2

up vote 6 down vote accepted

This is Jaime's idea, I just love it:

import numpy as np

def asvoid(arr):
    """View the array as dtype np.void (bytes)
    This collapses ND-arrays to 1D-arrays, so you can perform 1D operations on them.
    http://stackoverflow.com/a/16216866/190597 (Jaime)"""    
    arr = np.ascontiguousarray(arr)
    return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))

def find_index(arr, x):
    arr_as1d = asvoid(arr)
    x = asvoid(x)
    return np.nonzero(arr_as1d == x)[0]


arr = np.array([[  1,  15,   0,   0],
                [ 30,  10,   0,   0],
                [ 30,  20,   0,   0],
                [1, 2, 3, 4],
                [104, 139, 146,  75],
                [  9,  11, 146,  74],
                [  9, 138, 146,  75]], dtype='uint8')

arr = np.tile(arr,(1221488,1))
x = np.array([1,2,3,4], dtype='uint8')

print(find_index(arr, x))

yields

[      3      10      17 ..., 8550398 8550405 8550412]

The idea is to view each row of the array as a string. For example,

In [15]: x
Out[15]: 
array([^A^B^C^D], 
      dtype='|V4')

The strings look like garbage, but they are really just the underlying data in each row viewed as bytes. You can then compare arr_as1d == x to find which rows equal x.


There is another way to do it:

def find_index2(arr, x):
    return np.where((arr == x).all(axis=1))[0]

but it turns out to be not as fast:

In [34]: %timeit find_index(arr, x)
1 loops, best of 3: 209 ms per loop

In [35]: %timeit find_index2(arr, x)
1 loops, best of 3: 370 ms per loop
share|improve this answer
1  
I don't love it. Views can be a very nice trick, but you have to be careful about contiguiuty and if you use this on a float array you get -0. != 0. (does not matter for these uint8, but...). Is this even any faster then just using arr.all for each row? The view trick makes more sense to me if you want to use sorting methods because you need to find the index often. –  seberg Jul 23 '13 at 13:51
    
Well, loving is perhaps too strong a word, but it seems to fill a need for which there is sometimes no other option. And in this case, it is faster than np.all(..., axis=1). –  unutbu Jul 23 '13 at 13:55
    
Sorry, there is a ascontiguous array in there anyway, so that part is fine :) –  seberg Jul 23 '13 at 13:57
    
No, you were correct; I just added it. And thanks for the warning regarding -0. != 0. –  unutbu Jul 23 '13 at 13:58
    
Yes, find_index is very cool! –  petRUShka Jul 23 '13 at 15:12

If you perform search more than one time and you don't mind to use extra memory, you can create set from you array (I'm using list here, but it's almost the same code):

>>> elem = [1, 2, 3, 4]    
>>> elements = [[  1,  15,   0,   0], [ 30,  10,   0,   0], [1, 2, 3, 4]]
>>> index = set([tuple(x) for x in elements])
>>> True if tuple(elem) in index else False
True
share|improve this answer
    
Don't even need the conditional statement. tuple(... index should be fine on itself. ( I can't check though; on my phone) –  Haidro Jul 23 '13 at 13:26

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.