1

I have a numpy array that is rather large, about 1mill. The distinct number of numbers is about 8 numbered 1-8.

Lets say I want given the number 2, I would like to recode all 2's to 1 and the rest to 0's.

i.e. 
2==>1
1345678==0

Is there a pythonic way to do this with numpy?


[1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8]=> [0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0]

Thanks

2 Answers 2

5

That's the result of a == 2 for a NumPy array a:

>>> a = numpy.random.randint(1, 9, size=20)
>>> a
array([4, 5, 1, 2, 5, 7, 2, 5, 8, 2, 4, 6, 6, 1, 8, 7, 1, 7, 8, 7])
>>> a == 2
array([False, False, False,  True, False, False,  True, False, False,
        True, False, False, False, False, False, False, False, False,
       False, False], dtype=bool)
>>> (a == 2).astype(int)
array([0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

If you want to change a in place, the most efficient way to do so is to use numpy.equal():

>>> numpy.equal(a, 2, out=a)
array([0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
4

I'd probably use np.where for this:

>>> import numpy as np
>>> a = np.array([[1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8]])
>>> np.where(a==2, 1, 0)
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])
2
  • Using numpy.where() on the result of a == 2 here seems redundant to me, and it is less efficient than simply not calling it. Any rationale for it? Commented Aug 9, 2012 at 12:28
  • Only that I tend to like solutions which are robust to perturbations in the values; no real motivation otherwise.
    – DSM
    Commented Aug 9, 2012 at 12:44

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.