Stack Overflow is a community of 4.7 million programmers, just like you, helping each other.

Join them; it only takes a minute:

Sign up
Join the Stack Overflow community to:
  1. Ask programming questions
  2. Answer and help your peers
  3. Get recognized for your expertise

I have two NumPy arrays:

A = asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = asarray(['2', '4', '8', '16', '32'])

I want a function that takes A, B as parameters, and returns the index in B for each value in A, aligned with A, as efficiently as possible.

These are the outputs for the test case above:

indices = [1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]

I've tried exploring in1d(), where(), and nonzero() with no luck. Any help is much appreciated.

Edit: Arrays are strings.

share|improve this question
up vote 0 down vote accepted

I'm not sure how efficient this is but it works:

import numpy as np
A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
B = np.asarray(['2', '4', '8', '16', '32'])
idx_of_a_in_b=np.argmax(A[np.newaxis,:]==B[:,np.newaxis],axis=0)
print(idx_of_a_in_b)

from which I get:

[1 1 0 2 2 2 2 2 3 4 3 3 4]
share|improve this answer
    
This seems to be the one! Thanks! – Will Jul 16 '13 at 20:21
    
Note: this solution is quadratic in terms of the input side, which is not ideal. – Eelco Hoogendoorn Apr 2 at 15:44

You can also do:

>>> np.digitize(A,B)-1
array([1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4])

According to the docs you should be able to specify right=False and skip the minus one part. This does not work for me, likely due to a version issue as I do not have numpy 1.7.

Im not sure what you are doing with this, but a simple and very fast way to do this is:

>>> A = np.asarray(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'])
>>> B,indices=np.unique(A,return_inverse=True)
>>> B
array(['16', '2', '32', '4', '8'],
      dtype='|S2')
>>> indices
array([3, 3, 1, 4, 4, 4, 4, 4, 0, 2, 0, 0, 2])

>>> B[indices]
array(['4', '4', '2', '8', '8', '8', '8', '8', '16', '32', '16', '16', '32'],
      dtype='|S2')

The order will be different, but this can be changed if needed.

share|improve this answer
1  
You are implicitly relying in B being sorted. – Jaime Jul 10 '13 at 17:04
1  
But other than that, which is easily solved, e.g. as in my answer, this is faster than np.searchsorted, so +1. – Jaime Jul 10 '13 at 17:08
    
Let me further complicate matters by saying A and B are arrays of strings :( Apparently digitize() doesn't like. – Will Jul 10 '13 at 21:03
1  
Is B always the unique array of A? – Ophion Jul 10 '13 at 22:10
    
Actually, yes. B is always the unique of A. – Will Jul 10 '13 at 22:16

For such things it is important to have lookups in B as fast as possible. Dictionary provides O(1) lookup time. So, first of all, let us construct this dictionary:

>>> indices = dict((value,index) for index,value in enumerate(B))
>>> indices
{8: 2, 16: 3, 2: 0, 4: 1, 32: 4}

And then just go through A and find corresponding indices:

>>> [indices[item] for item in A]
[1, 1, 0, 2, 2, 2, 2, 2, 3, 4, 3, 3, 4]
share|improve this answer
    
Thanks, this is great. But, is there any way to do it in NumPy-C-happy-land? {dict: comprehension} seems a bit faster as well if we went with this route. Is there no nice NumPy way to do it without having to pass a dict around? – Will Jul 10 '13 at 10:13
1  
@Will If B is large, it's important to have O(1) lookup complexity. I'm not familiar with numpy, but perfunctory search didn't yield any references to dict analogs in numpy. If B is small, it may be faster to do everything inside numpy. If so, wait for another answers, may be someone will be able to come up with all-in-numpy solution. – ovgolovin Jul 10 '13 at 10:20

I think you can do it with np.searchsorted:

>>> A = asarray([4, 4, 2, 8, 8, 8, 8, 8, 16, 32, 16, 16, 32])
>>> B = asarray([2, 8, 4, 32, 16])
>>> sort_b = np.argsort(B)
>>> idx_of_a_in_sorted_b = np.searchsorted(B, A, sorter=sort_b)
>>> idx_of_a_in_b = np.take(sort_b, idx_of_a_in_sorted_b)
>>> idx_of_a_in_b
array([2, 2, 0, 1, 1, 1, 1, 1, 4, 3, 4, 4, 3], dtype=int64)

Note that B is scrambled from your version, thus the different output. If some of the items in A are not in B (which you could check with np.all(np.in1d(A, B))) then the return indices for those values will be crap, and you may even get an IndexError from the last line (if the largest value in A is missing from B).

share|improve this answer

The numpy_indexed package (disclaimer: I am its author) implements a solution along the same lines as Jaime's solution; but with a nice interface, tests, and a lot of related useful functionality:

import numpy_indexed as npi
print(npi.indices(B, A))
share|improve this answer
1  
You keep posting almost identical answers pointing at your utility, not being clear about your affiliation to the linked repo. To keep them from getting flagged as spam, you should take the steps described in: How can I link to an external resource in a community-friendly way? – Mogsdad Apr 2 at 15:36
    
Thanks for the heads-up, but are you sure these linked conditions apply? This isn't a 'product or website' I am linking, but rather an open-source project. Mentioning my authorship under those circumstances feels more like self-promotion than useful information. – Eelco Hoogendoorn Apr 2 at 15:46
    
Based on similar feedback I have decided to add a disclaimer; thanks again. – Eelco Hoogendoorn Apr 2 at 20:48

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.