Calculating Euclidean norm for each vector in a sparse matrix

Question

Below is a naive algorithm to find nearest neighbours for a point in some n-dimensional space.

import numpy as np
import scipy.sparse as ss

# brute force method to get k nearest neighbours to point p
def knn_naive(k, p, X):
    stacked_p = ss.vstack([p for n in range(X.get_shape()[0])])
    D = X - stacked_p
    D = np.sqrt(D.multiply(D).sum(1))
    result = sorted(D)[:k]
    return result

Types for arguments to knn_naive(...) are:

# type(k): int
# type(p): <class 'scipy.sparse.csr.csr_matrix'>
# type(X): <class 'scipy.sparse.csr.csr_matrix'>

An example of dimensions would be:

X.shape: (100, 3004)
p.shape: (1, 3004)

Are there techniques to improve the above implementation in terms of computational time and/or memory?

Jaime · Accepted Answer · 2014-04-13 02:57:56Z

I wasn't aware that scipy's sparse matrices did not support broadcasting. What a shame that you can't simply do D = X - p! Your construction of stacked_p can be sped up quite a bit, but you need to access the internals of the CSR matrix directly. If you aren't aware of the details, the wikipedia article has a good description of the format.

In any case, you can construct stacked_p as follows:

rows, cols = X.shape
stacked_p = ss.csr_matrix((np.tile(p.data, rows), np.tile(p.indices, rows),
                           np.arange(0, rows*p.nnz + 1, p.nnz)), shape=X.shape)

With some dummy data:

import numpy as np
import scipy.sparse as sps

p = sps.rand(1, 1000, density=0.01, format='csr')

In [15]: %timeit sps.vstack([p for n in xrange(1000)]).tocsr()
10 loops, best of 3: 173 ms per loop

In [16]: %timeit sps.csr_matrix((np.tile(p.data, 1000), np.tile(p.indices, 1000),
                                 np.arange(0, 1000*p.nnz + 1, p.nnz)), (1000, 1000))
10000 loops, best of 3: 94.1 µs per loop

That is close to 2000x times faster...

asked	1 year ago
viewed	360 times
active	1 year ago

current community

your communities

more stack exchange communities

Calculating Euclidean norm for each vector in a sparse matrix

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged python performance matrix numpy clustering or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Calculating Euclidean norm for each vector in a sparse matrix

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python performance matrix numpy clustering or ask your own question.

Related

Hot Network Questions