python - pandas: how to sort results of groupby using a pd.cut categorical variable -

- July 15, 2010

i have data frame output groupby using categorical variable created pd.cut.

import pandas pd import numpy np  di = pd.dataframe({'earnings':pd.np.random.choice(10000, 10000), 'counts':[1] * 10000}) brackets=append(np.arange(0,5001,500),100000000) di['earncat']=pd.cut(di['earnings'], brackets,right=false,retbins=true)[0]  di_everyone=di.groupby('earncat').sum()[['counts']] di_everyone.sort_index(inplace=true) di_everyone.to_string

and output,

[0, 500)          83,005,823 [1000, 1500)      11,995,255 [1500, 2000)      13,943,052 [2000, 2500)      11,967,696 [2500, 3000)      10,741,178 [3000, 3500)       9,749,914 [3500, 4000)       6,833,928 [4000, 4500)       7,150,125 [4500, 5000)       4,655,773 [500, 1000)        9,718,753 [5000, 100000000) 26,588,622

i'm not sure why [500, 1000) appears on second last line. decided not label earncat because want see breakdown. how can sort on earncat?

thanks in advance

you using pandas 0.15.x not support kind of operation categorical dtypes (which pd.cut function produces)

in meantime, can work around problem this:

di['earnlower'] = di['earncat'].apply(lambda x: int(x[1:].split(',')[0])) di['earnhigher'] = di['earncat'].apply(lambda x: int(x[:-2].split(',')[1]))  di_everyone=di.groupby(['earnlower', 'earnhigher']).sum()[['counts']]

Search This Blog

Th

python - pandas: how to sort results of groupby using a pd.cut categorical variable -

Comments

Post a Comment

Popular posts from this blog

xslt - Substring before throwing error -

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -