Fastest way to iterate over Numpy array

Question

I wrote a function to calculate the gamma coefficient of a clustering. The bottleneck is the comparison of values from dist_withing to dist_between. To speed this up, I tried to adapt and compile it using Cython (I dealt with C only few times). But I don't know, how to rapidly iterate over numpy arrays or if its possible at all to do it faster than

for i in range(len(arr)):
    arr[i]

I thought I could use a pointer to the array data and indeed the code runs in only half of the time, but pointer1[i] and pointer2[j] in cdef unsigned int countlower won't give me the expected values from the arrays. So, how to properly and speedy iterate over an array? And where else can be made improvements, even if in this case it would not make such a difference concerning runtime-speed?

# cython: profile=True
import cython

import numpy as np
cimport numpy as np

from scipy.spatial.distance import squareform

DTYPE = np.float
DTYPEint = np.int

ctypedef np.float_t DTYPE_t
ctypedef np.int_t DTYPEint_t

@cython.profile(False)
cdef unsigned int countlower(np.ndarray[DTYPE_t, ndim=1] vec1,
                             np.ndarray[DTYPE_t, ndim=1] vec2,
                             int n1, int n2):
    # Function output corresponds to np.bincount(v1 < v2)[1]

    assert vec1.dtype == DTYPE and vec2.dtype == DTYPE

    cdef unsigned int i, j
    cdef unsigned int trues = 0
    cdef unsigned int* pointer1 = <unsigned int*> vec1.data
    cdef unsigned int* pointer2 = <unsigned int*> vec2.data

    for i in range(n1):
        for j in range(n2):
            if pointer1[i] < pointer2[j]:
                trues += 1

    return trues


def gamma(np.ndarray[DTYPE_t, ndim=2] Y, np.ndarray[DTYPEint_t, ndim=1] part):
    assert Y.dtype == DTYPE and part.dtype == DTYPEint

    if len(Y) != len(part):
        raise ValueError('Distance matrix and partition must have same shape')

    # defined locals
    cdef unsigned int K, c_label, n_, trues
    cdef unsigned int s_plus = 0
    cdef unsigned int s_minus = 0

    # assigned locals
    cdef np.ndarray n_in_ci = np.bincount(part)
    cdef int num_clust = len(n_in_ci) - 1
    cdef np.ndarray s = np.zeros(len(Y), dtype=DTYPE)

    # Partition should have at least two clusters
    K = len(set(part))
    if K < 2:
        return 0
    # Loop through clusters
    for c_label in range(1, K+1):
        dist_within = squareform(Y[part == c_label][:, part == c_label])
        dist_between = np.ravel(Y[part == c_label][:, part != c_label])
        n1 = len(dist_within)
        n2 = len(dist_between)

        trues = countlower(dist_within, dist_between, n1, n2)
        s_plus += trues
        s_minus += n1 * n2 - trues

    n_ =  s_plus + s_minus

    return (<double>s_plus - <double>s_minus) / <double>n_ if n_ != 0 else 0

Edit1: Passing just the pointers, instead of the arrays to the time-critical function (>99% of time is spent there) made a ~ 10% speed-up. I guess some things just cannot be made faster

@cython.profile(False)
@cython.boundscheck(False)
@cython.nonecheck(False)
cdef unsigned int countlower(double* v1, double* v2, int n1, int n2):
    ''' Function output corresponds to np.bincount(v1 < v2)[1]'''
    cdef unsigned int trues = 0

    cdef Py_ssize_t i, j
    with nogil, parallel():
        for i in prange(n1):
            for j in prange(n2):
                if v1[i] < v2[j]:
                    trues += 1
    return trues

Matt · Answer 1 · 2014-01-06 23:12:59Z

When you deal with performance in cython, I would suggest using the --annotate flag (or use IPython with cython magic that allow you quick iteration with anotate flag too), it will tell you which part of your code may be slow. It generates an Html report with highlighted lines. The more yellow, potentially the slower. You can also click on the line o see the generated C, and generally you just call out into Python world from C when things get slow, like checking array bounds, negative indexing, catching exceptions... So, you might want to use the following decorators on your functions if you know you won't have out of bounds errors, or negative indexing from the end :

@cython.boundscheck(False)
@cython.wraparound(False)

Keep in mind that it you do have out of bounds, you will segfault.

This memview bench might give you ideas.

Yo might also want to look at numpy view, if you like to avoid copy and know things won't be muted (but I think it's the default now)

The memview bench is quite helpful. Do understand a little bit better now. The -a option also is helpful, however checking for zero division produces some c-code but effects speed insignificantly. If I work with numpy types, should I always use smth like np.float_t instead of double? I need a line profiler, which works with cython! — embert, Jan 7 at 16:19
My knowledge of cython doesn't go that far. Though, it seem to me that division in C is generally much slower than multiplication. Quick search give me @cython.cdivision(True) decorator. looking at the def for n_ I would say that you can simplify logics(x+trues-trues=x) . — Matt, Jan 8 at 8:01

asked	10 days ago
viewed	65 times
active	2 days ago

Explore our sites

Fastest way to iterate over Numpy array

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged python optimization array numpy cython or ask your own question.

Hot Network Questions

Explore our sites

Fastest way to iterate over Numpy array

1 Answer

Your Answer

Sign up or login

Post as a guest

Not the answer you're looking for? Browse other questions tagged python optimization array numpy cython or ask your own question.

Related

Hot Network Questions