I want to be able to find a solution to run the following code in a much faster fashion (ideally something like dataframe.apply(func)
which has the fastest speed, just behind iterating rows/cols- and there, there is already a 3x speed decrease). The problem is twofold: how to set this up AND save stuff in other places (an embedded function might do that). I know the pandas function for ROLLING window regression is already optimized to its limit but I was wondering how to get rid of the loop cycle and other \$O(N^k)\$ I might have missed.
Any help is greatly appreciated
import pandas as pd
import numpy as np
periods = 1000
alt_pan_fondi_prices = pd.DataFrame(np.random.randn(periods ,4),index=pd.date_range('2011-1-1', periods=peridos), columns = list('ABCD'))
indu = pd.DataFrame(np.random.randn(periods ,4),index=pd.date_range('2011-1-1', periods=peridos), columns = list('ABCD'))
indu.columns = list('ABCD')
# some names to be used later
cols = ['fund'] + [("bench_" + str(i)) for i in list('ABCD')]
for item in alt_pan_fondi_prices.columns.values:
to_infer = alt_pan_fondi_prices[item].dropna()
indu = indu.loc[to_infer.index[0]:, :].dropna()
dfBothPrices = pd.concat([to_infer, indu], axis=1)
dfBothPrices = dfBothPrices.fillna(method='bfill')
dfBothReturns = dfBothPrices.pct_change()
dfBothReturns.columns = cols
mask = cols[1:]
# execute the OLS model
model = pd.ols(y=dfBothReturns['fund'], x=dfBothReturns[mask], window=20)
# I then need to store a whole bunch of stuff (alphas / betas / rsquared / etc) but I have this part safely taken care of