Take the 2-minute tour ×
Code Review Stack Exchange is a question and answer site for peer programmer code reviews. It's 100% free, no registration required.

I am confused by an algorithm for counting the number of pairs of N random points which has a distance closer than d = sqrt((p1.x-p2.x)^2 + (p1.y-p2.y)^2) The general approach is straightforward:

   struct point p[N] = .......
   for(i = 0; i < N; i++)
      for(j = i+1; j < N; j++)
        if(distance(p[i], p[j]) < d) cnt++;

Running time is O(N^2).

The improved algorithm is as follows, which is based on 2-dimensional array of lists

typedef struct node *link;
struct node {struct point p; link next};

link **grid; int G; float d; int cnt = 0;

gridinsert(float x, float y)
{
  int i, j;
  link s;
  int X = x * G + 1;
  int Y = y * G + 1;
  link t = malloc(sizeof(link));

  t->p.x = x; t->p.y = y;

  for(i = X - 1; i <= X + 1; i++)
    for(i = Y - 1; j <= Y + 1; j++)
      for(s = grid[i][j]; s != NULL; s = s->next)
        if(distance(s->p, t->p) < d) cnt++;

  t->next = grid[X][Y];
  grid[X][Y] = t;
}

float randFloat() {return 1.0*rand()/RAND_MAX};

void main()
{
  int i, j, N = atoi(argv[1]);
  d = atof(argv[2]); G = 1/d;

  grid = malloc2d(G+2, G+2); //malloc2d is a function to allocate memory for 2D array
  for(i = 0; i < G+2; i++)
    for(j = 0;  < G+2; j++)
      grid[i][j] = NULL;

  for(i = 0; i < N; i++)
    gridinsert(randFloat(), randFloat());
  printf("%d\n", cnt);
}

How to understand this algorithm and what is the running time now?

share|improve this question

closed as off-topic by 200_success, Jamal Mar 7 '14 at 4:35

If this question can be reworded to fit the rules in the help center, please edit the question.

1  
Just as an aside, it's worth noting that for this it's often worthwhile to work with the square of the distance. Squaring the distance lets you avoid taking the square roots in all the measurements. This doesn't affect the computational complexity, but can still give a substantial improvement in speed. –  Jerry Coffin Aug 16 '11 at 16:24
1  
This question appears to be off-topic because it does not seem like you wrote the code in question, nor does it appear that you are seeking a code review. –  200_success Mar 7 '14 at 4:27

1 Answer 1

This algorithm constructs an integer-based coordinate system, a grid. The size of the grid depends on the parameter to the program which is stored in G. The algorithm stores each point to be compared in this coarse grid, each grid cell holding a list of points in that cell. So the program operates on a 2d array of lists, each list is a list of points in a grid cell, and we have a 2d array of grid cells. At the end of gridinsert(), the new point is inerted to the beginning of the list of its cell, where the cell is indexed by X, Y.

Before that insertion, the nested for-for-for-if block iterates through all the direct neighbours (X+/-1 and Y+/-1) of the point's cell, and counts all the points that are closer than d. The reason this works faster is because we only loop through those point in the directly neighboring cells instead of all the points (becuase the previous links where also not initialized).

So what gives us the guarantee that no point closer than d will be further away than by a single cell? It is because G is defined as G=1/d.

The running time of this algorithm will depend on the actual dataset. In the worst case it is still O(N^2), but in general it should be much better.

share|improve this answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.