Below is a naive algorithm to find nearest neighbours for a point in some n-dimensional space.
import numpy as np
import scipy.sparse as ss
# brute force method to get k nearest neighbours to point p
def knn_naive(k, p, X):
stacked_p = ss.vstack([p for n in range(X.get_shape()[0])])
D = X - stacked_p
D = np.sqrt(D.multiply(D).sum(1))
result = sorted(D)[:k]
return result
Types for arguments to knn_naive(...)
are:
# type(k): int
# type(p): <class 'scipy.sparse.csr.csr_matrix'>
# type(X): <class 'scipy.sparse.csr.csr_matrix'>
An example of dimensions would be:
X.shape: (100, 3004)
p.shape: (1, 3004)
Are there techniques to improve the above implementation in terms of computational time and/or memory?