python - Assigning (or tieing in) function results back to original data in pandas -
i struggling extracting regression coefficients once complete function call np.polyfit (actual code below). able display of each coefficient unsure how extract them future use original data.
df=pd.read_csv('2_skews.csv')
here head() of data
date expiry symbol strike vol 0 6/10/2015 1/19/2016 ibm 50 42.0 1 6/10/2015 1/19/2016 ibm 55 41.5 2 6/10/2015 1/19/2016 ibm 60 40.0 3 6/10/2015 1/19/2016 ibm 65 38.0 4 6/10/2015 1/19/2016 ibm 70 36.0
there many symbols many strikes across many days , many expiry dates well
i have grouped data date, symbol , expiry , call regression function this:
df_reg=df.groupby(['date','symbol','expiry']).apply(regress)
i have function seems work (gives proper coefficients), don't seem able access them , tie them original data.
def regress(df): y=df['vol'] x=df['strike'] z=p.polyfit(x,y,4) return (z)
i calling polyfit this:
from numpy.polynomial import polynomial p
the final results:
df_reg date symbol expiry 5/19/2015 gs 1/19/2016 [-112.064833151, 6.76871521993, -0.11147562136... 3/21/2016 [-131.2914493, 7.16441276062, -0.1145534833, 0... ibm 1/19/2016 [211.458028147, -5.01236287512, 0.044819313514... 3/21/2016 [-34.1027973807, 3.16990194634, -0.05676206572... 6/10/2015 gs 1/19/2016 [50.3916788503, 0.795484227762, -0.02701849495... 3/21/2016 [31.6090441114, 0.851878910113, -0.01972772270... ibm 1/19/2016 [-13.6159660078, 3.23002791603, -0.06015739505... 3/21/2016 [-51.6709051223, 4.80288173687, -0.08600312989... dtype: object
the top results has functional form of :
y = -0.000002x4 + 0.000735x3 - 0.111476x2 + 6.768715x - 112.064833
i have tried take constructive criticism of previous individuals , make question clear possible, please let me know if still need work on :-)
john
changing output of regress
series rather numpy array give data frame when groupby. index of series column names:
in [37]: df = pd.dataframe( [[ '6/10/2015', '1/19/2016', 'ibm', 50, 42.0], [ '6/10/2015', '1/19/2016', 'ibm', 55, 41.5], [ '6/10/2015', '1/19/2016', 'ibm', 60, 40.0], [ '6/10/2015', '1/19/2016', 'ibm', 65, 38.0], [ '6/10/2015', '1/19/2016', 'ibm', 70, 36.0]], columns=['date', 'expiry', 'symbol', 'strike', 'vol']) def regress(df): y=df['vol'] x=df['strike'] z=np.polyfit(x,y,4) return pd.series(z, name='order', index=range(5)[::-1]) group_cols = ['date', 'expiry', 'symbol'] coeffs = df.groupby(group_cols).apply(regress) coeffs out[40]: order 4 3 2 1 0 date expiry symbol 6/10/2015 1/19/2016 ibm -5.388312e-18 0.000667 -0.13 8.033333 -118
to columns containing coefficients each combination of date, expiry , symbol can merge df
, coeffs
on these columns:
in [25]: df.merge(coeffs.reset_index(), on=group_cols) out[25]: date expiry symbol strike vol 4 3 2 1 0 0 6/10/2015 1/19/2016 ibm 50 42.0 -6.644454e-18 0.000667 -0.13 8.033333 -118 1 6/10/2015 1/19/2016 ibm 55 41.5 -6.644454e-18 0.000667 -0.13 8.033333 -118 2 6/10/2015 1/19/2016 ibm 60 40.0 -6.644454e-18 0.000667 -0.13 8.033333 -118 3 6/10/2015 1/19/2016 ibm 65 38.0 -6.644454e-18 0.000667 -0.13 8.033333 -118 4 6/10/2015 1/19/2016 ibm 70 36.0 -6.644454e-18 0.000667 -0.13 8.033333 -118
you can
df = df.merge(coeffs.reset_index(), on=group_cols) strike_powers = pd.dataframe(dict((i, df.strike**i) in range(5)) df['modelled_vol'] = (strike_powers * df[range(5)]).sum(axis=1)
Comments
Post a Comment