python - Realigning labels and levels in multi-level columns in Pandas DataFrame -

- May 15, 2013

based on example dataframe multi-level columns found on question:

arrays = [[1, 2]*3, ['a', 'b', 'c']*2] columns = pd.multiindex.from_arrays(arrays, names=['foo', 'bar'])  df = pd.dataframe(np.random.randn(2,6),                   columns=columns,                   index= pd.date_range('20000103',periods=2)) print(df)  foo                1         2         1         2         1         2 bar                        b         c                 b         c 2000-01-03  2.040867 -1.727071  0.126223  2.101799  1.811349 -0.003293 2000-01-04 -2.644979  0.156852 -1.034756  0.609625 -0.213968 -0.293759

i need re-order column labels instead follow pattern, re-aligning data necessary:

bar                                b         b         c         c foo                1         2         1         2         1         2 2000-01-03  2.040867  2.101799  1.811349 -1.727071  0.126223 -0.003293 2000-01-04 -2.644979  0.609625 -0.213968  0.156852 -1.034756 -0.293759

i swapped column levels either of these:

 # option 1  df.columns = df.columns.reorder_levels([1,0])   # option 2  df.columns = df.columns.swaplevel(0, 1)

this worked, how go moving labels around while keeping data aligned?

i tried following:

df.columns = df.columns.set_labels([0,0,1,1,2,2], level=0).set_labels([0,1,0,1,0,1], level=1)

unfortunately data not move labels (i.e. data misaligned):

bar                                  b                   c           foo                1         2         1         2         1         2 2000-01-03  2.040867 -1.727071  0.126223  2.101799  1.811349 -0.003293 2000-01-04 -2.644979  0.156852 -1.034756  0.609625 -0.213968 -0.293759

i have many things, including reindexing, no avail.

edit: figurative example in reality data contains categorical labels [min, max, single] in second level of column axis (after swap), i.e.:

arrays = [['max', 'min', 'single']*3, np.repeat(['a', 'b', 'c'], 3)] columns = pd.multiindex.from_arrays(arrays, names=['foo', 'bar'])  df = pd.dataframe(np.random.randn(2,9),                   columns=columns,                   index= pd.date_range('20000103',periods=2)) print(df)  foo              max       min    single       max       min    single       max       min    single bar                                        b         b         b         c         c         c 2000-01-03 -0.004233  0.820975  1.481674  0.064850 -0.178978 -0.862092  0.136279 -0.517081 -1.557611 2000-01-04  0.835346 -0.403773 -0.035985  1.079355  1.780113 -1.037420  1.459070 -0.254668 -0.091501  df.columns = df.columns.swaplevel(0,1)  print(df)  bar                                            b                             c                     foo              max       min    single       max       min    single       max       min    single 2000-01-03 -0.004233  0.820975  1.481674  0.064850 -0.178978 -0.862092  0.136279 -0.517081 -1.557611 2000-01-04  0.835346 -0.403773 -0.035985  1.079355  1.780113 -1.037420  1.459070 -0.254668 -0.091501

for second level, want use explicitly provided order: ['min', 'max', 'single']. therefore, sorting alphabetically suggested @primer not work.

after have swapped levels need use .sort_index , produce desired result. works me:

arrays = [[1, 2]*3, ['a', 'b', 'c']*2] columns = pd.multiindex.from_arrays(arrays, names=['foo', 'bar'])  df = pd.dataframe(pd.np.random.randn(2,6),                   columns=columns,                   index= pd.date_range('20000103',periods=2)) print(df)  foo             1      2      1      2      1      2 bar                  b      c           b      c 2000-01-03 -1.165  0.901  0.466 -1.536  1.488  1.896 2000-01-04  1.179 -0.180 -1.071  1.054 -0.403  1.222  df.columns = df.columns.swaplevel(0, 1) df.sort_index(1, inplace=true) print df  bar                         b             c        foo             1      2      1      2      1      2 2000-01-03 -1.165 -1.536  1.488  0.901  0.466  1.896 2000-01-04  1.179  1.054 -0.403 -0.180 -1.071  1.222

update:

categorical has ordered attribute can use setup order want , sort .sort_index or .sort. here illustration of how setup custom order foo:

array1 = ['a', 'b']*3 array2 = ['min', 'max', 'single']*2 columns = pd.multiindex.from_tuples(zip(array2, array1), names=['foo', 'bar'])  df = pd.dataframe(pd.np.random.randn(2,6),                   columns=columns,                   index= pd.date_range('20000103',periods=2))  print df  foo        single    max    min single    max    min bar                  b           b           b 2000-01-03  0.098  0.583 -0.399  0.370 -1.307  1.658 2000-01-04 -0.118 -0.680  0.666 -0.461 -1.334 -1.347  # here extract foo column illustrate categorical , sorted in 'wrong' order: df = df.stack().stack().reset_index().rename(columns={'level_0':'date',0:'val'}) df['foo'] = df.foo.astype('category') print df.foo  0        max 1        min 2     single 3        max 4        min 5     single 6        max 7        min 8     single 9        max 10       min 11    single name: foo, dtype: category categories (3, object): [max, min, single]

note list in last line - order (alphabetical) sorted if call .sort on it.

now redefine column new order:

df['foo'] = pd.categorical(pd.np.asarray(df.foo), categories=['min','single','max'], ordered=true) print df.foo  0        max 1        min 2     single 3        max 4        min 5     single 6        max 7        min 8     single 9        max 10       min 11    single name: foo, dtype: category categories (3, object): [min < single < max]

note in last line list showing new order , < signs telling ordered categorical series. sorted in shown order when call .sort on it.

hope helps.

Search This Blog

Th

python - Realigning labels and levels in multi-level columns in Pandas DataFrame -

Comments

Post a Comment

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -