python - Pandas - Convert Unbalanced Panel Data to Cross Section -
actually unsure if end of cross-section because it's on time period, think still.
i have data frame looks this:
player finish tournament year id ------------------------------------------------ aaron baddeley 9 memorial 2012 1 aaron baddeley 17 masters 2013 1 aaron watkins 15 open 2012 2 adam scott 9 open 2014 3 adam scott 4 memorial 2014 3 alex cejka 8 open 2010 4 andres romero 2 memorial 2012 5 andrew svoboda 19 memorial 2014 6 andy sullivan 13 memorial 2015 7
i want convert data single observations, desired output this:
player 2012_memorial 2013_memorial 2014_memorial ... id ---------------------------------------------------------------------------- aaron baddeley 9 17 2012 1 adam scott na na 9 3 . . .
i've found split-apply-combine
paradigm, looks promising. on surface, i've done df.groupby('id')
, print statement outputs this:
player finish tournament year id 1 aaron baddeley 9 memorial 2012 2 aaron watkins 15 open 2012 3 adam scott 9 open 2014
so seems have collapsed groups, i've lost data? or how object stored? realize haven't done apply stage, how generate new rows , new columns, don't know next step or if there's cookbook example this.
thanks, jared
it looks you're looking way pivot_table
operation.
starting approximation of table:
df = pd.dataframe({ 'player': ['ab', 'ab', 'aw', 'as', 'as'], 'finish': [9, 17, 15, 9, 4], 'tournament': ['m', 'm', 'us', 'us', 'm'], 'year': [12, 13, 12, 12, 14], 'id': [1, 1, 2, 3, 3]})
giving:
>> df finish id player tournament year 0 9 1 ab m 12 1 17 1 ab m 13 2 15 2 aw 12 3 9 3 12 4 4 3 m 14
you can run
>> df.pivot_table(values='finish', index=['player', 'id'], columns=['year', 'tournament']) year 12 13 14 tournament m m m player id ab 1 9 nan 17 nan 3 nan 9 nan 4 aw 2 nan 15 nan nan
note gives multilevel column of years/tournaments. think cleaner, can flatten it.
Comments
Post a Comment