1. Introduction
You have made this question unnecessarily difficult for us because:
You have not specified what the function countlower
does. You should always write a docstring for a function, specifying what it does, what arguments it takes, and what it returns. (And test cases are always appreciated.)
It's not clear what the role of the arguments n1
and n2
is. You only ever pass len(v1)
for n1
and len(v2)
for n2
. So is that a requirement? Or do you expect to be able to pass in other values?
I am going to assume in what follows that:
the specification of the countlower
function is Return the number of pairs i, j such that v1[i] < v2[j]
;
n1
is always len(v1)
and n2
is always len(v2)
;
the Cython details are not essential to your problem, and that it's OK to work in plain Python.
Here's my rewrite of your countlower
function. Note the docstring, the doctest, and the simple implementation, which loops over the sequence elements rather than their indices:
def countlower1(v, w):
"""Return the number of pairs i, j such that v[i] < w[j].
>>> countlower1(list(range(0, 200, 2)), list(range(40, 140)))
4500
"""
return sum(x < y for x in v for y in w)
And here's a 1000-element test case, which I'll use in the rest of this answer to compare the performance of various implementations of this function:
>>> v = np.array(list(range(0, 2000, 2)))
>>> w = np.array(list(range(400, 1400)))
>>> from timeit import timeit
>>> timeit(lambda:countlower1(v, w), number=1)
8.449613849865273
2. Vectorize
The whole reason for using NumPy is that it enables you to vectorize operations on arrays of fixed-size numeric data types. If you can successfully vectorize an operation, then it executes mostly in C, avoiding the substantial overhead of the Python interpreter.
Whenever you find yourself iterating over the elements of an array, then you're not getting any benefit from NumPy, and this is a sign that it's time to rethink your approach.
So let's vectorize the countlower
function. This is easy using a sparse numpy.meshgrid
:
import numpy as np
def countlower2(v, w):
"""Return the number of pairs i, j such that v[i] < w[j].
>>> countlower2(np.arange(0, 2000, 2), np.arange(400, 1400))
450000
"""
grid = np.meshgrid(v, w, sparse=True)
return np.sum(grid[0] < grid[1])
Let's see how fast that is on the 1000-element test case:
>>> timeit(lambda:countlower2(v, w), number=1)
0.005706002004444599
That's about 1500 times faster than countlower1
.
3. Improve the algorithm
The vectorized countlower2
still takes O(n2) time on arrays of length O(n), because it has to compare every pair of elements. Is it possible to do better than that?
Suppose that I start by sorting the first array v
. Then consider an element y
from the second array w
, and find the point where y
would fit into the sorted first array, that is, find i
such that v[i - 1] < y <= v[i]
. Then y
is greater than i
elements from v
. This position can be found in time O(log n) using bisect.bisect_left
, and so the algorithm as a whole has a runtime of O(n log n).
Here's a straightforward implementation:
from bisect import bisect_left
def countlower3(v, w):
"""Return the number of pairs i, j such that v[i] < w[j].
>>> countlower3(list(range(0, 2000, 2)), list(range(400, 1400)))
450000
"""
v = sorted(v)
return sum(bisect_left(v, y) for y in w)
This implementation is about three times faster than countlower3
on the 1000-element test case:
>>> timeit(lambda:countlower3(v, w), number=1)
0.0021441911812871695
This shows the importance of finding the best algorithm, not just speeding up the algorithm you've got. Here an O(n log n) algorithm in plain Python beats a vectorized O(n2) algorithm in NumPy.
4. Vectorize again
Now we can vectorize the improved algorithm, using numpy.searchsorted
:
import numpy as np
def countlower4(v, w):
"""Return the number of pairs i, j such that v[i] < w[j].
>>> countlower4(np.arange(0, 20000, 2), np.arange(4000, 14000))
45000000
"""
return np.sum(np.searchsorted(np.sort(v), w))
And this is six times faster still:
>>> timeit(lambda:countlower4(v, w), number=1)
0.0003434771206229925