Intro - Nearest neighbour comparison between two arrays is something that can be done very efficiently when only 1 variable per array is involved, because it can be done by sorting and then performing a rolling comparison. However, for a 2 variable comparison this sort operation cannot be done anymore and another smart way has to be thought of. Hence the challenge:
Challenge - The challenge is to write a code that can, for each row in array A, find a row in array B such that (A[i,1]-B[j,1])^2+(A[i,2]-B[j,2])^2 is minimal and that then stores the corresponding row number j. The perfect minimum (0) is a perfect match of A[i,1] with B[j,1] and of A[i,2] with B[j,2].
As a code example I have a very clunky loop written in R
N=nrow(SetA)
for (i in 1:N){
#calculate vector with distance of row i to the entire SetB
vec=with(SetB,(SetA[i,x]-x)^2+(SetA[i,y]-y)^2)
#calculate minimum of that vector and store the index
indexVec[i,ind:=which.min(vec)]
}
Some sidenotes - If it matters (not sure it will) assume that both arrays have 10k rows. Ties can be decided at 'random' e.g. determined by processing sequence or whatever way is more convenient in your code.
Example in/out - For clarity, here is the sample input
SetA=[1 3;2 4;3 5;4 6;5 7]
SetB=[2 2.5; 4 3.1; 2 2.0; 3 3.0; 6 5.0;]
and the expected sample output
indexVec=[1;4;4;5;5]
where the values in indexVec indicate where the nearest neighbour of row i in SetA is located in SetB.
Fastest code (as tested on my i5-4210U) wins. The testset will consist of 2 arrays of 10k rows and 2 columns filled with floating point numbers between -1E10 and +1E10. Of course seeded with the same seed to remove any unfairness.
Feel free to use whatever language, but I am also very interested in an optimized version of the R code above.