I'm a Python beginner (more used to code in R) and I'd like to optimize a function.
2 arrays ares filled by integers from 0 to 100. A number can't appear twice in a row and they're stored in an ascending order.
Array1
:nrow
= 100 000;ncol
= 5Array2
:nrow
= 50 000;ncol
= 5
For each row of Array1
and each row of Array2
I need to count the number of similar values and store this result in a 3rd array.
Array3
:nrow
= 100 000;ncol
= 50 000
Here is the current function, with a smaller array2
(50 rows instead of 50 000)
array1= np.random.randint(0,100,(100000,5))
array2 = np.random.randint(0,100,(50,5))
def Intersection(array1, array2):
Intersection = np.empty([ array1.shape[0] , array2.shape[0] ], dtype=int8)
for i in range(0, array1.shape[0]):
for j in range(0, array2.shape[0]):
Intersection[i,j] = len( set(array1[i,]).intersection(array2[j,]) )
return Intersection
import time
start = time.time()
Intersection(array1,array2)
end = time.time()
print end - start
23.46 sec. So it should take hours if array2
has 50 000 rows.
How can I optimize this function by keeping it simple to understand?