python - pandas: how to sort results of groupby using a pd.cut categorical variable -
i have data frame output groupby using categorical variable created pd.cut.
import pandas pd import numpy np di = pd.dataframe({'earnings':pd.np.random.choice(10000, 10000), 'counts':[1] * 10000}) brackets=append(np.arange(0,5001,500),100000000) di['earncat']=pd.cut(di['earnings'], brackets,right=false,retbins=true)[0] di_everyone=di.groupby('earncat').sum()[['counts']] di_everyone.sort_index(inplace=true) di_everyone.to_string
and output,
[0, 500) 83,005,823 [1000, 1500) 11,995,255 [1500, 2000) 13,943,052 [2000, 2500) 11,967,696 [2500, 3000) 10,741,178 [3000, 3500) 9,749,914 [3500, 4000) 6,833,928 [4000, 4500) 7,150,125 [4500, 5000) 4,655,773 [500, 1000) 9,718,753 [5000, 100000000) 26,588,622
i'm not sure why [500, 1000) appears on second last line. decided not label earncat because want see breakdown. how can sort on earncat?
thanks in advance
you using pandas 0.15.x not support kind of operation categorical dtypes (which pd.cut function produces)
in meantime, can work around problem this:
di['earnlower'] = di['earncat'].apply(lambda x: int(x[1:].split(',')[0])) di['earnhigher'] = di['earncat'].apply(lambda x: int(x[:-2].split(',')[1])) di_everyone=di.groupby(['earnlower', 'earnhigher']).sum()[['counts']]
Comments
Post a Comment