Revisions to Fastest way to iterate over Numpy array

improve math formatting

Source Link

edited Nov 13, 2016 at 9:53

50.1k
3
130
210

You have made thisThis question unnecessarilyis difficult for us because:

You haveIt's not specifiedclear what the function countlower does. You shouldIt's always a good idea to write a docstring for a function, specifying what it does, what arguments it takes, and what it returns. (And test cases are always appreciated.)
It's not clear what the role of the arguments n1 and n2 is. YouThe code in the post only ever passpasses len(v1) for n1 and len(v2) for n2. So is that a requirement? Or do you expect to be ableis it sometimes possible to pass in other values?

the specification of the countlower function is Return the number of pairs i, j such that v1[i] < v2[j];
n1 is always len(v1) and n2 is always len(v2);
the Cython details are not essential to yourthe problem, and that it's OK to work in plain Python.

Here's my rewrite of yourthe countlower function. Note the docstring, the doctest, and the simple implementation, which loops over the sequence elements rather than their indices:

The vectorized countlower2 still takes O(n²)\$O(n^2)\$ time on arrays of length O(n)\$O(n)\$, because it has to compare every pair of elements. Is it possible to do better than that?

Suppose that I start by sorting the first array v. Then consider an element y from the second array w, and find the point where y would fit into the sorted first array, that is, find i such that v[i - 1] < y <= v[i]. Then y is greater than i elements from v. This position can be found in time O(log n)\$O(\log n)\$ using bisect.bisect_left, and so the algorithm as a whole has a runtime of O(n log n)\$O(n \log n)\$.

This shows the importance of finding the best algorithm, not just speeding up the algorithm you've got. Here an O(n log n)\$O(n \log n)\$ algorithm in plain Python beats a vectorized O(n²)\$O(n^2)\$ algorithm in NumPy.

You have made this question unnecessarily difficult for us because:

You have not specified what the function countlower does. You should always write a docstring for a function, specifying what it does, what arguments it takes, and what it returns. (And test cases are always appreciated.)
It's not clear what the role of the arguments n1 and n2 is. You only ever pass len(v1) for n1 and len(v2) for n2. So is that a requirement? Or do you expect to be able to pass in other values?

the specification of the countlower function is Return the number of pairs i, j such that v1[i] < v2[j];
n1 is always len(v1) and n2 is always len(v2);
the Cython details are not essential to your problem, and that it's OK to work in plain Python.

Here's my rewrite of your countlower function. Note the docstring, the doctest, and the simple implementation, which loops over the sequence elements rather than their indices:

The vectorized countlower2 still takes O(n²) time on arrays of length O(n), because it has to compare every pair of elements. Is it possible to do better than that?

Suppose that I start by sorting the first array v. Then consider an element y from the second array w, and find the point where y would fit into the sorted first array, that is, find i such that v[i - 1] < y <= v[i]. Then y is greater than i elements from v. This position can be found in time O(log n) using bisect.bisect_left, and so the algorithm as a whole has a runtime of O(n log n).

This shows the importance of finding the best algorithm, not just speeding up the algorithm you've got. Here an O(n log n) algorithm in plain Python beats a vectorized O(n²) algorithm in NumPy.

This question is difficult because:

It's not clear what the function countlower does. It's always a good idea to write a docstring for a function, specifying what it does, what arguments it takes, and what it returns. (And test cases are always appreciated.)
It's not clear what the role of the arguments n1 and n2 is. The code in the post only ever passes len(v1) for n1 and len(v2) for n2. So is that a requirement? Or is it sometimes possible to pass in other values?

the specification of the countlower function is Return the number of pairs i, j such that v1[i] < v2[j];
n1 is always len(v1) and n2 is always len(v2);
the Cython details are not essential to the problem, and that it's OK to work in plain Python.

Here's my rewrite of the countlower function. Note the docstring, the doctest, and the simple implementation, which loops over the sequence elements rather than their indices:

The vectorized countlower2 still takes \$O(n^2)\$ time on arrays of length \$O(n)\$, because it has to compare every pair of elements. Is it possible to do better than that?

Suppose that I start by sorting the first array v. Then consider an element y from the second array w, and find the point where y would fit into the sorted first array, that is, find i such that v[i - 1] < y <= v[i]. Then y is greater than i elements from v. This position can be found in time \$O(\log n)\$ using bisect.bisect_left, and so the algorithm as a whole has a runtime of \$O(n \log n)\$.

This shows the importance of finding the best algorithm, not just speeding up the algorithm you've got. Here an \$O(n \log n)\$ algorithm in plain Python beats a vectorized \$O(n^2)\$ algorithm in NumPy.

answer questions from comments

Source Link

edited Jan 28, 2014 at 15:49

Gareth Rees

50.1k
3
130
210

5. Answers to your questions

In comments, you asked:

"What does vectorizing mean?" Please read the "What is NumPy?" section of the NumPy documentation, in particular the section starting:

Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” (in optimized, pre-compiled C code).

"What is meshgrid?" Please read the documentation for numpy.meshgrid.

I use meshgrid to create a NumPy array grid containing all pairs of elements x, y where x is an element of v and y is an element of w. Then I apply the < function to those pairs, getting an array of Booleans, which I sum. Try it out in the interactive interpreter and see for yourself:

    >>> import numpy as np
    >>> v = [2, 4, 6]
    >>> w = [1, 3, 5]
    >>> np.meshgrid(v, w)
    [array([[2, 4, 6],
           [2, 4, 6],
           [2, 4, 6]]), array([[1, 1, 1],
           [3, 3, 3],
           [5, 5, 5]])]
    >>> _[0] < _[1]
    array([[False, False, False],
           [ True, False, False],
           [ True,  True, False]], dtype=bool)
    >>> np.sum(_)
    3

5. Answers to your questions

In comments, you asked:

"What does vectorizing mean?" Please read the "What is NumPy?" section of the NumPy documentation, in particular the section starting:

Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” (in optimized, pre-compiled C code).

"What is meshgrid?" Please read the documentation for numpy.meshgrid.

I use meshgrid to create a NumPy array grid containing all pairs of elements x, y where x is an element of v and y is an element of w. Then I apply the < function to those pairs, getting an array of Booleans, which I sum. Try it out in the interactive interpreter and see for yourself:

    >>> import numpy as np
    >>> v = [2, 4, 6]
    >>> w = [1, 3, 5]
    >>> np.meshgrid(v, w)
    [array([[2, 4, 6],
           [2, 4, 6],
           [2, 4, 6]]), array([[1, 1, 1],
           [3, 3, 3],
           [5, 5, 5]])]
    >>> _[0] < _[1]
    array([[False, False, False],
           [ True, False, False],
           [ True,  True, False]], dtype=bool)
    >>> np.sum(_)
    3

Source Link

answered Jan 28, 2014 at 11:59

Gareth Rees

50.1k
3
130
210

1. Introduction

You have made this question unnecessarily difficult for us because:

You have not specified what the function countlower does. You should always write a docstring for a function, specifying what it does, what arguments it takes, and what it returns. (And test cases are always appreciated.)
It's not clear what the role of the arguments n1 and n2 is. You only ever pass len(v1) for n1 and len(v2) for n2. So is that a requirement? Or do you expect to be able to pass in other values?

I am going to assume in what follows that:

the specification of the countlower function is Return the number of pairs i, j such that v1[i] < v2[j];
n1 is always len(v1) and n2 is always len(v2);
the Cython details are not essential to your problem, and that it's OK to work in plain Python.

Here's my rewrite of your countlower function. Note the docstring, the doctest, and the simple implementation, which loops over the sequence elements rather than their indices:

def countlower1(v, w):
    """Return the number of pairs i, j such that v[i] < w[j].

    >>> countlower1(list(range(0, 200, 2)), list(range(40, 140)))
    4500

    """
    return sum(x < y for x in v for y in w)

And here's a 1000-element test case, which I'll use in the rest of this answer to compare the performance of various implementations of this function:

>>> v = np.array(list(range(0, 2000, 2)))
>>> w = np.array(list(range(400, 1400)))
>>> from timeit import timeit
>>> timeit(lambda:countlower1(v, w), number=1)
8.449613849865273

2. Vectorize

The whole reason for using NumPy is that it enables you to vectorize operations on arrays of fixed-size numeric data types. If you can successfully vectorize an operation, then it executes mostly in C, avoiding the substantial overhead of the Python interpreter.

Whenever you find yourself iterating over the elements of an array, then you're not getting any benefit from NumPy, and this is a sign that it's time to rethink your approach.

So let's vectorize the countlower function. This is easy using a sparse numpy.meshgrid:

import numpy as np

def countlower2(v, w):
    """Return the number of pairs i, j such that v[i] < w[j].

    >>> countlower2(np.arange(0, 2000, 2), np.arange(400, 1400))
    450000

    """
    grid = np.meshgrid(v, w, sparse=True)
    return np.sum(grid[0] < grid[1])

Let's see how fast that is on the 1000-element test case:

>>> timeit(lambda:countlower2(v, w), number=1)
0.005706002004444599

That's about 1500 times faster than countlower1.

3. Improve the algorithm

The vectorized countlower2 still takes O(n²) time on arrays of length O(n), because it has to compare every pair of elements. Is it possible to do better than that?

Suppose that I start by sorting the first array v. Then consider an element y from the second array w, and find the point where y would fit into the sorted first array, that is, find i such that v[i - 1] < y <= v[i]. Then y is greater than i elements from v. This position can be found in time O(log n) using bisect.bisect_left, and so the algorithm as a whole has a runtime of O(n log n).

Here's a straightforward implementation:

from bisect import bisect_left

def countlower3(v, w):
    """Return the number of pairs i, j such that v[i] < w[j].

    >>> countlower3(list(range(0, 2000, 2)), list(range(400, 1400)))
    450000

    """
    v = sorted(v)
    return sum(bisect_left(v, y) for y in w)

This implementation is about three times faster than countlower3 on the 1000-element test case:

>>> timeit(lambda:countlower3(v, w), number=1)
0.0021441911812871695

This shows the importance of finding the best algorithm, not just speeding up the algorithm you've got. Here an O(n log n) algorithm in plain Python beats a vectorized O(n²) algorithm in NumPy.

4. Vectorize again

Now we can vectorize the improved algorithm, using numpy.searchsorted:

import numpy as np

def countlower4(v, w):
    """Return the number of pairs i, j such that v[i] < w[j].

    >>> countlower4(np.arange(0, 20000, 2), np.arange(4000, 14000))
    45000000

    """
    return np.sum(np.searchsorted(np.sort(v), w))

And this is six times faster still:

>>> timeit(lambda:countlower4(v, w), number=1)
0.0003434771206229925

Stack Exchange Network

Return to Answer

5. Answers to your questions

5. Answers to your questions

1. Introduction

2. Vectorize

3. Improve the algorithm

4. Vectorize again