1

I have got two NumPy arrays. In my case Y contains an output and P the probability that this output is correct. Rows and columns are of the form (outputs, noOfAnswers) or (probability, noOfAnswers). So in general output is much bigger than noOfAnswers.

I am selecting the two most significant results concerning P by:

chooseThem = np.argpartition(P,-2,axis=1)[:,-2:]

Now I wish to create a new Array YP of the size (outputs, 2) with just the values specified by chooseThem. With a for loop this is straightforward but the performance is not OK.

Here is an example for the "bad approach" with some artificial arrays:

import numpy as np
Y = 4*(np.random.rand(1000,6)-0.5)
P = np.random.rand(1000,6)
biggest2 = np.argpartition(P,-2,axis=1)[:,-2:]
YNew = np.zeros((1000,2))

for j in range(2):
    for i in range(1000):
        YNew[i,j] = Y[i,biggest2[i,j]]

Does anyone have a suggestion for a fast way to create this new array?

1 Answer 1

1

This works for slicing the array

dex = np.array([np.arange(1000),np.arange(1000)]).T
YNew = Y[dex,biggest2]

with some testing (old = loop method new = index method)

1000 rows

%timeit new(Y,P,1000,biggest2)
The slowest run took 4.47 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 39.1 µs per loop

%timeit old(Y,P,1000,biggest2)
1000 loops, best of 3: 853 µs per loop

100000 rows

%timeit new(Y,P,100000,biggest2)
100 loops, best of 3: 4.49 ms per loop

%timeit old(Y,P,100000,biggest2)
10 loops, best of 3: 89.4 ms per loop

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.