Speed up array query in Numpy/Python

Question

I have an array of points (called points), consisting of ~30000 x,y, and z values. I also have a separate array of points (called vertices), about ~40000 x,y, and z values. The latter array indexes the lower-front-left corners of some cubes of side length size. I would like to find out which points reside in which cubes, and how many points reside in each cube. I wrote a loop to do this, which works like this:

for i in xrange(len(vertices)):        
    cube=((vertices[i,0]<= points[:,0]) & 
    (points[:,0]<(vertices[i,0]+size)) & 
    (vertices[i,1]<= points[:,1]) & 
    (points[:,1] < (vertices[i,1]+size)) &
    (vertices[i,2]<= points[:,2]) & 
    (points[:,2] < (vertices[i,2]+size))
    )
    numpoints[i]=len(points[cube])

(The loop is order the individual cubes, and the "cube" creates a boolean array of indices.) I then store points[cube] somewhere, but this isn't what's slowing me down; it's the creating of "cube=".

I would like to speed this loop up (it takes tens of seconds to complete on a macbook pro). I tried rewriting the "cube=" part in C, as follows:

for i in xrange(len(vertices)):
    cube=zeros(pp, dtype=bool)
    code="""
            for (int j=0; j<pp; ++j){

                if (vertices(i,0)<= points(j,0))
                 if (points(j,0) < (vertices(i,0)+size))
                  if (vertices(i,1)<= points(j,1))
                   if (points(j,1) < (vertices(i,1)+size))
                    if (vertices(i,2)<= points(j,2))
                     if (points(j,2) < (vertices(i,2)+size))
                      cube(j)=1;
            }
        return_val = 1;"""

    weave.inline(code,
    ['vertices', 'points','size','pp','cube', 'i']) 
    numpoints[i]=len(points[cube])

This sped it up by a bit more than a factor of two. Rewriting both loops in C actually made it only slightly faster than the original numpy-only version, because of the frequent references to the array objects necessary to keep track of which points are in which cubes. I suspect it's possible to do this much more rapidly, and that I'm missing something. Can anyone suggest how to speed this up?? I'm new to numpy/python, and thanks in advance.

HYRY · Accepted Answer · 2012-02-17 04:49:38Z

You can use scipy.spatial.cKDTree to speedup this kind of calculation.

Here is the code:

import time
import numpy as np

#### create some sample data ####
np.random.seed(1)

V_NUM = 6000
P_NUM = 8000

size = 0.1

vertices = np.random.rand(V_NUM, 3)
points = np.random.rand(P_NUM, 3)

numpoints = np.zeros(V_NUM, np.int32)

#### brute force ####
start = time.clock()
for i in xrange(len(vertices)):        
    cube=((vertices[i,0]<= points[:,0]) & 
    (points[:,0]<(vertices[i,0]+size)) & 
    (vertices[i,1]<= points[:,1]) & 
    (points[:,1] < (vertices[i,1]+size)) &
    (vertices[i,2]<= points[:,2]) & 
    (points[:,2] < (vertices[i,2]+size))
    )
    numpoints[i]=len(points[cube])

print time.clock() - start

#### KDTree ####
from scipy.spatial import cKDTree
center_vertices = vertices + [[size/2, size/2, size/2]]
start = time.clock()
tree_points = cKDTree(points)
_, result = tree_points.query(center_vertices, k=100, p = np.inf, distance_upper_bound=size/2)
numpoints2 = np.zeros(V_NUM, np.int32)
for i, neighbors in enumerate(result):
    numpoints2[i] = np.sum(neighbors!=P_NUM)

print time.clock() - start
print np.all(numpoints == numpoints2)

Change the cube corner position to center position first.

center_vertices = vertices + [[size/2, size/2, size/2]]

Create a cKDTree from points

tree_points = cKDTree(points)

Do query, k is the the number of nearest neighbors to return, p=np.inf means maximum-coordinate-difference distance, distance_upper_bound is the max distance.

_, result = tree_points.query(center_vertices, k=100, p = np.inf, distance_upper_bound=size/2)

The output is:

2.04113164434
0.11087783696
True

If there are more then 100 points in a cube, you can check this by neighbors[-1] == P_NUM in the for loop, and do a k=1000 query for these vertices.

asked	1 year ago
viewed	255 times
active	1 year ago

Speed up array query in Numpy/Python

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged python arrays performance numpy vectorization or ask your own question.

Speed up array query in Numpy/Python

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged python arrays performance numpy vectorization or ask your own question.

Related