python - Assigning (or tieing in) function results back to original data in pandas -


i struggling extracting regression coefficients once complete function call np.polyfit (actual code below). able display of each coefficient unsure how extract them future use original data.

df=pd.read_csv('2_skews.csv') 

here head() of data

      date      expiry   symbol   strike vol 0  6/10/2015  1/19/2016    ibm      50  42.0 1  6/10/2015  1/19/2016    ibm      55  41.5 2  6/10/2015  1/19/2016    ibm      60  40.0 3  6/10/2015  1/19/2016    ibm      65  38.0 4  6/10/2015  1/19/2016    ibm      70  36.0 

there many symbols many strikes across many days , many expiry dates well

i have grouped data date, symbol , expiry , call regression function this:

df_reg=df.groupby(['date','symbol','expiry']).apply(regress) 

i have function seems work (gives proper coefficients), don't seem able access them , tie them original data.

def regress(df):     y=df['vol']     x=df['strike']     z=p.polyfit(x,y,4) return (z) 

i calling polyfit this:

from numpy.polynomial import polynomial p 

the final results:

df_reg   date       symbol  expiry    5/19/2015  gs      1/19/2016    [-112.064833151, 6.76871521993, -0.11147562136...                    3/21/2016    [-131.2914493, 7.16441276062, -0.1145534833, 0...            ibm     1/19/2016    [211.458028147, -5.01236287512, 0.044819313514...                    3/21/2016    [-34.1027973807, 3.16990194634, -0.05676206572... 6/10/2015  gs      1/19/2016    [50.3916788503, 0.795484227762, -0.02701849495...                    3/21/2016    [31.6090441114, 0.851878910113, -0.01972772270...            ibm     1/19/2016    [-13.6159660078, 3.23002791603, -0.06015739505...                    3/21/2016    [-51.6709051223, 4.80288173687, -0.08600312989... dtype: object 

the top results has functional form of :

y = -0.000002x4 + 0.000735x3 - 0.111476x2 + 6.768715x - 112.064833 

i have tried take constructive criticism of previous individuals , make question clear possible, please let me know if still need work on :-)

john

changing output of regress series rather numpy array give data frame when groupby. index of series column names:

in [37]:  df = pd.dataframe( [[  '6/10/2015',  '1/19/2016',    'ibm',      50,  42.0], [ '6/10/2015',  '1/19/2016',    'ibm',      55,  41.5], [  '6/10/2015',  '1/19/2016',    'ibm',      60,  40.0], [  '6/10/2015',  '1/19/2016',    'ibm',      65,  38.0], [  '6/10/2015',  '1/19/2016',    'ibm',      70,  36.0]], columns=['date', 'expiry', 'symbol', 'strike', 'vol'])  def regress(df):     y=df['vol']     x=df['strike']     z=np.polyfit(x,y,4)     return pd.series(z, name='order', index=range(5)[::-1])  group_cols = ['date', 'expiry', 'symbol'] coeffs = df.groupby(group_cols).apply(regress) coeffs   out[40]:                          order         4      3          2         1    0 date           expiry   symbol                   6/10/2015   1/19/2016   ibm -5.388312e-18   0.000667    -0.13   8.033333   -118 

to columns containing coefficients each combination of date, expiry , symbol can merge df , coeffs on these columns:

in [25]: df.merge(coeffs.reset_index(), on=group_cols) out[25]: date    expiry     symbol   strike    vol    4              3               2          1       0 0   6/10/2015   1/19/2016   ibm 50  42.0    -6.644454e-18   0.000667    -0.13   8.033333    -118 1   6/10/2015   1/19/2016   ibm 55  41.5    -6.644454e-18   0.000667    -0.13   8.033333    -118 2   6/10/2015   1/19/2016   ibm 60  40.0    -6.644454e-18   0.000667    -0.13   8.033333    -118 3   6/10/2015   1/19/2016   ibm 65  38.0    -6.644454e-18   0.000667    -0.13   8.033333    -118 4   6/10/2015   1/19/2016   ibm 70  36.0    -6.644454e-18   0.000667    -0.13   8.033333    -118 

you can

df = df.merge(coeffs.reset_index(), on=group_cols) strike_powers = pd.dataframe(dict((i, df.strike**i) in range(5)) df['modelled_vol'] = (strike_powers * df[range(5)]).sum(axis=1) 

Comments

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -