python - Pandas - Convert Unbalanced Panel Data to Cross Section -


actually unsure if end of cross-section because it's on time period, think still.

i have data frame looks this:

player          finish  tournament  year    id ------------------------------------------------ aaron baddeley  9       memorial    2012    1 aaron baddeley  17      masters     2013    1 aaron watkins   15      open     2012    2 adam scott      9       open     2014    3 adam scott      4       memorial    2014    3 alex cejka      8       open     2010    4 andres romero   2       memorial    2012    5 andrew svoboda  19      memorial    2014    6 andy sullivan   13      memorial    2015    7 

i want convert data single observations, desired output this:

player           2012_memorial    2013_memorial    2014_memorial   ...  id ---------------------------------------------------------------------------- aaron baddeley        9                 17              2012             1 adam scott            na                na               9               3  . .  . 

i've found split-apply-combine paradigm, looks promising. on surface, i've done df.groupby('id') , print statement outputs this:

               player  finish tournament  year  id                                                                         1      aaron baddeley       9   memorial  2012  2       aaron watkins      15    open  2012  3          adam scott       9    open  2014  

so seems have collapsed groups, i've lost data? or how object stored? realize haven't done apply stage, how generate new rows , new columns, don't know next step or if there's cookbook example this.

thanks, jared

it looks you're looking way pivot_table operation.

starting approximation of table:

df = pd.dataframe({     'player': ['ab', 'ab', 'aw', 'as', 'as'],     'finish': [9, 17, 15, 9, 4],     'tournament': ['m', 'm', 'us', 'us', 'm'],     'year': [12, 13, 12, 12, 14],     'id': [1, 1, 2, 3, 3]}) 

giving:

>> df         finish  id  player  tournament  year 0   9   1   ab  m   12 1   17  1   ab  m   13 2   15  2   aw   12 3   9   3     12 4   4   3    m   14 

you can run

>> df.pivot_table(values='finish', index=['player', 'id'], columns=['year', 'tournament'])     year    12  13  14     tournament  m    m   m player  id               ab  1   9   nan     17  nan  3   nan     9   nan     4 aw  2   nan     15  nan     nan 

note gives multilevel column of years/tournaments. think cleaner, can flatten it.


Comments

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -