python - Duplicate a list in a Pandas Dataframe into a new row -


i have dataframe in pandas (associateid) has list of codes against index. goes this:

indexbudgetcode   associateid nexusapp_341800   ppc_fli_1111 nexusweb_120000   ooc_htl_1010 primweb_1900000   ppc_fli_9999,ppc_fli_1777 

you can see in cases, there more 1 associateid in row - seperated comma, no spaces.

from line of code, can them list:

b = pd.dataframe(budgetdf.associateid.str.split(',').tolist(), index=budgetdf.budgetcode).stack() 

which looks this:

associateid                    indexbudgetcode    [ppc_fli_9999, ppc_fli_1777]   primweb_1900000    

but can't seem duplicate list create final dataframe:

indexbudgetcode   associateid nexusapp_341800   ppc_fli_1111 nexusweb_120000   ooc_htl_1010 primweb_1900000   ppc_fli_9999 primweb_1900000   ppc_fli_1777 

can shed light on approach can use achieve this?

thanks

perhaps easiest way expand associateids separate rows use generator expression build rows:

((index, item)   index, row in df['associateid'].str.split(',').iteritems()   item in row) 

you can pass generator expression pd.dataframe obtain desired dataframe.


import numpy np import pandas pd  df = pd.dataframe({     'indexbudgetcode':['nexusapp_341800', 'nexusweb_120000', 'primweb_1900000'],     'associateid':['ppc_fli_1111', 'ooc_htl_1010', 'ppc_fli_9999,ppc_fli_1777']}) df = df.set_index(['indexbudgetcode'])  result = pd.dataframe(((index, item)                         index, row in df['associateid'].str.split(',').iteritems()                         item in row),                       columns=['indexbudgetcode', 'associateid']) print(result) 

which yields dataframe

   indexbudgetcode   associateid 0  nexusapp_341800  ppc_fli_1111 1  nexusweb_120000  ooc_htl_1010 2  primweb_1900000  ppc_fli_9999 3  primweb_1900000  ppc_fli_1777 

another way, not use generator expression is

result = df.groupby(level=0)['associateid'].apply(     lambda grp: pd.series(1, index=grp.str.split(',').tolist())) result.index.names = ['indexbudgetcode', 'associateid'] result = result.reset_index(['associateid']) result = result[['associateid']] 

which yields series

                  associateid indexbudgetcode               nexusapp_341800  ppc_fli_1111 nexusweb_120000  ooc_htl_1010 primweb_1900000  ppc_fli_9999 primweb_1900000  ppc_fli_1777 

Comments

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

oracle - Changing start date for system jobs related to automatic statistics collections in 11g -