python - Realigning labels and levels in multi-level columns in Pandas DataFrame -
based on example dataframe multi-level columns found on question:
arrays = [[1, 2]*3, ['a', 'b', 'c']*2] columns = pd.multiindex.from_arrays(arrays, names=['foo', 'bar']) df = pd.dataframe(np.random.randn(2,6), columns=columns, index= pd.date_range('20000103',periods=2)) print(df) foo 1 2 1 2 1 2 bar b c b c 2000-01-03 2.040867 -1.727071 0.126223 2.101799 1.811349 -0.003293 2000-01-04 -2.644979 0.156852 -1.034756 0.609625 -0.213968 -0.293759
i need re-order column labels instead follow pattern, re-aligning data necessary:
bar b b c c foo 1 2 1 2 1 2 2000-01-03 2.040867 2.101799 1.811349 -1.727071 0.126223 -0.003293 2000-01-04 -2.644979 0.609625 -0.213968 0.156852 -1.034756 -0.293759
i swapped column levels either of these:
# option 1 df.columns = df.columns.reorder_levels([1,0]) # option 2 df.columns = df.columns.swaplevel(0, 1)
this worked, how go moving labels around while keeping data aligned?
i tried following:
df.columns = df.columns.set_labels([0,0,1,1,2,2], level=0).set_labels([0,1,0,1,0,1], level=1)
unfortunately data not move labels (i.e. data misaligned):
bar b c foo 1 2 1 2 1 2 2000-01-03 2.040867 -1.727071 0.126223 2.101799 1.811349 -0.003293 2000-01-04 -2.644979 0.156852 -1.034756 0.609625 -0.213968 -0.293759
i have many things, including reindexing, no avail.
edit: figurative example in reality data contains categorical labels [min, max, single] in second level of column axis (after swap), i.e.:
arrays = [['max', 'min', 'single']*3, np.repeat(['a', 'b', 'c'], 3)] columns = pd.multiindex.from_arrays(arrays, names=['foo', 'bar']) df = pd.dataframe(np.random.randn(2,9), columns=columns, index= pd.date_range('20000103',periods=2)) print(df) foo max min single max min single max min single bar b b b c c c 2000-01-03 -0.004233 0.820975 1.481674 0.064850 -0.178978 -0.862092 0.136279 -0.517081 -1.557611 2000-01-04 0.835346 -0.403773 -0.035985 1.079355 1.780113 -1.037420 1.459070 -0.254668 -0.091501 df.columns = df.columns.swaplevel(0,1) print(df) bar b c foo max min single max min single max min single 2000-01-03 -0.004233 0.820975 1.481674 0.064850 -0.178978 -0.862092 0.136279 -0.517081 -1.557611 2000-01-04 0.835346 -0.403773 -0.035985 1.079355 1.780113 -1.037420 1.459070 -0.254668 -0.091501
for second level, want use explicitly provided order: ['min', 'max', 'single']
. therefore, sorting alphabetically suggested @primer not work.
after have swapped levels need use .sort_index
, produce desired result. works me:
arrays = [[1, 2]*3, ['a', 'b', 'c']*2] columns = pd.multiindex.from_arrays(arrays, names=['foo', 'bar']) df = pd.dataframe(pd.np.random.randn(2,6), columns=columns, index= pd.date_range('20000103',periods=2)) print(df) foo 1 2 1 2 1 2 bar b c b c 2000-01-03 -1.165 0.901 0.466 -1.536 1.488 1.896 2000-01-04 1.179 -0.180 -1.071 1.054 -0.403 1.222 df.columns = df.columns.swaplevel(0, 1) df.sort_index(1, inplace=true) print df bar b c foo 1 2 1 2 1 2 2000-01-03 -1.165 -1.536 1.488 0.901 0.466 1.896 2000-01-04 1.179 1.054 -0.403 -0.180 -1.071 1.222
update:
categorical
has ordered
attribute can use setup order want , sort .sort_index
or .sort
. here illustration of how setup custom order foo
:
array1 = ['a', 'b']*3 array2 = ['min', 'max', 'single']*2 columns = pd.multiindex.from_tuples(zip(array2, array1), names=['foo', 'bar']) df = pd.dataframe(pd.np.random.randn(2,6), columns=columns, index= pd.date_range('20000103',periods=2)) print df foo single max min single max min bar b b b 2000-01-03 0.098 0.583 -0.399 0.370 -1.307 1.658 2000-01-04 -0.118 -0.680 0.666 -0.461 -1.334 -1.347 # here extract foo column illustrate categorical , sorted in 'wrong' order: df = df.stack().stack().reset_index().rename(columns={'level_0':'date',0:'val'}) df['foo'] = df.foo.astype('category') print df.foo 0 max 1 min 2 single 3 max 4 min 5 single 6 max 7 min 8 single 9 max 10 min 11 single name: foo, dtype: category categories (3, object): [max, min, single]
note list in last line - order (alphabetical) sorted if call .sort
on it.
now redefine column new order:
df['foo'] = pd.categorical(pd.np.asarray(df.foo), categories=['min','single','max'], ordered=true) print df.foo 0 max 1 min 2 single 3 max 4 min 5 single 6 max 7 min 8 single 9 max 10 min 11 single name: foo, dtype: category categories (3, object): [min < single < max]
note in last line list showing new order , <
signs telling ordered
categorical
series. sorted in shown order when call .sort
on it.
hope helps.
Comments
Post a Comment