Vectorizing for loops in python with numpy multidimensional arrays

Question

I'm trying to improve the performance of this code below. Eventually it will be using much bigger arrays but I thought I would start of with something simple that works then look at where is is slow, optimise it then try it out on the full size. Here is the original code:

#Minimum example with random variables
import numpy as np
import matplotlib.pyplot as plt

n=4
# Theoretical Travel Time to each station
ttable=np.array([1,2,3,4])
# Seismic traces,measured at each station
traces=np.random.random((n, 506))
dt=0.1
# Forward Problem add energy to each trace at the deserired time from a given origin time
given_origin_time=1
for i in range(n):
    # Energy will arrive at the sample equivelant to origin time + travel time
    arrival_sample=int(round((given_origin_time+ttable[i])/dt))
    traces[i,arrival_sample]=2

# The aim is to find the origin time by trying each possible origin time and adding the energy up. 
# Where this "Stack" is highest is likely to be the origin time

# Find the maximum travel time
tmax=ttable.max()


# We pad the traces to avoid when we shift by a travel time that the trace has no value
traces=np.lib.pad(traces,((0,0),(round(tmax/dt),round(tmax/dt))),'constant',constant_values=0)

#Available origin times to search for relative to the beginning of the trace
origin_times=np.linspace(-tmax,len(traces),len(traces)+round(tmax/dt))

# Create an empty array to fill with our stack
S=np.empty((origin_times.shape[0]))

# Loop over all the potential origin times
for l,otime in enumerate(origin_times):
    # Create some variables which we will sum up over all stations
    sum_point=0
    sqrr_sum_point=0
    # Loop over each station
    for m in range(n):
        # Find the appropriate travel time
        ttime=ttable[m] 
        # Grap the point on the trace that corresponds to this travel time + the origin time we are searching for 
        point=traces[m,int(round((tmax+otime+ttime)/dt))]
        # Sum up the points
        sum_point+=point
        # Sum of the square of the points
        sqrr_sum_point+=point**2
    # Create the stack by taking the square of the sums dived by sum of the squares normalised by the number of stations
    S[l]=sum_point#**2/(n*sqrr_sum_point)

# Plot the output the peak should be at given_origin_time
plt.plot(origin_times,S)
plt.show()

I think the problem i dont understand the broacasting and indexing of multidimensional arrays. After this I will be extended the dimensions to search for x,y,z which would be given by increaseing the dimension ttable. I will probably try and implement either pytables or np.memmap to help with the large arrays.

You don't show us ttable so we can't easily help you. Also please try shrinking your example code down to the bare minimum, with the fewest dimensions and variables required to demonstrate the issue. — John Zwinck
– John Zwinck, Commented Oct 23, 2014 at 9:49
I've edited the qusestion to define some example arrays hope that helps. Let me know if there is anything else you need. — usher96
– usher96, Commented Oct 23, 2014 at 10:19
Your example is still far from minimal. I bet you could cut out half that code and still have a useful program for the purposes of solving your issue with looping. For example, you have S2 which seems like it probably doesn't add anything to the question, and you have a whole extra dimension that could be removed without harming the question. — John Zwinck
– John Zwinck, Commented Oct 23, 2014 at 10:46
The immediate problem is that origin_times_int+ttime doesn't work because one has 250 elements and the other has four. I'm not sure what you're trying to do with that code. Clearly we need to walk through smaller snippets of code before we try to tackle the monster. :) — John Zwinck
– John Zwinck, Commented Oct 23, 2014 at 10:55
I agree with @JohnZwinck. I swear I would love to understand what your code is doing. It seems some sort of convolution and/or statistics. But the variable names don't help much. Do you think you can improve them? — heltonbiker
– heltonbiker, Commented Oct 23, 2014 at 12:58

perimosocordiae · Accepted Answer · 2014-10-23 16:33:11Z

2

With some quick profiling, it appears that the line

point=traces[m,int(round((tmax+otime+ttime)/dt))]

is taking ~40% of the total program's runtime. Let's see if we can vectorize it a bit:

    ttime_inds = np.around((tmax + otime + ttable) / dt).astype(int)
    # Loop over each station
    for m in range(n):
        # Grap the point on the trace that corresponds to this travel time + the origin time we are searching for 
        point=traces[m,ttime_inds[m]]

We noticed that the only thing changing in the loop (other than m) was ttime, so we pulled it out and vectorized that part using numpy functions.

That was the biggest hotspot, but we can go a bit further and remove the inner loop entirely:

# Loop over all the potential origin times
for l,otime in enumerate(origin_times):
    ttime_inds = np.around((tmax + otime + ttable) / dt).astype(int)
    points = traces[np.arange(n),ttime_inds]
    sum_point = points.sum()
    sqrr_sum_point = (points**2).sum()
    # Create the stack by taking the square of the sums dived by sum of the squares normalised by the number of stations
    S[l]=sum_point#**2/(n*sqrr_sum_point)

EDIT: If you want to remove the outer loop as well, we need to pull otime out:

ttime_inds = np.around((tmax + origin_times[:,None] + ttable) / dt).astype(int)

Then, we proceed as before, summing over the second axis:

points = traces[np.arange(n),ttime_inds]
sum_points = points.sum(axis=1)
sqrr_sum_points = (points**2).sum(axis=1)
S = sum_points # **2/(n*sqrr_sum_points)

edited Oct 23, 2014 at 16:33

answered Oct 23, 2014 at 15:48

perimosocordiae

17.9k14 gold badges64 silver badges76 bronze badges

2 Comments

usher96 Over a year ago

Excellent, is there a way to remove the next loop. Since the plan is to make this run over quite large arrays, and do the same thing of more more dimensions.

perimosocordiae Over a year ago

Sure, I've updated my answer. This will make things faster, but remember that it will also use more memory.

Collectives™ on Stack Overflow

Vectorizing for loops in python with numpy multidimensional arrays

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related