将重复的行转换为带有标题的多列

问题描述 投票:0回答:3

输入数据框:

case    constant    number  code        
761e7   C20         3570    A   
761e7   C20         2780    A   
761e7   C20         7150    A   
761e7   C20         2950    A   
761e7   C20         3570    B   
761e7   C20         2780    B   
761e7   C20         7150    B   
761e7   C20         2950    B
761e7   C21         3000    A   
761e8   C20         3570    A   
761e8   C20         2780    A   
761e8   C20         7150    A   
761e8   C20         2950    A   
761e8   C14         3570    B   
761e8   C14         2780    B   
761e8   C14         7150    B

尝试将重复的数字列转换为基于其他列的多列。

Pandas 数据透视转换给了我一个 ValueError,如图所示

df = final_df.pivot(index='case', columns='number')

ValueError: Index contains duplicate entries, cannot reshape

预期输出:

case    constant    code    number1 number2 number3 number4 number5
761e7   C20         A       3570    2780    7150    2950    0
761e7   C21         A       0       0       0       0       3000
761e7   C20         B       3570    2780    7150    2950    0
761e8   C20         A       3570    2780    7150    2950    0
761e8   C14         B       3570    2780    7150    0       0
python pandas
3个回答
1
投票

IIUC,尝试:

g = df.groupby(['case','constant','code'])

df_out = df.set_index(['case','constant','code',g.cumcount()+1]).unstack(fill_value=0)
df_out.columns = [f'{i}{j}' for i, j in df_out.columns]
df_out.reset_index()

输出:

    case constant code  number1  number2  number3  number4
0  761e7      C20    A     3570     2780     7150     2950
1  761e7      C20    B     3570     2780     7150     2950
2  761e7      C21    A     3000        0        0        0
3  761e8      C14    B     3570     2780     7150        0
4  761e8      C20    A     3570     2780     7150     2950

1
投票

更常见的方法是让列名称为数值,行包含计数 - 例如:

df.pivot_table(index=['case','constant','code'], 
               columns='number', aggfunc=len).reset_index()

屈服

number          case constant code  2780  2950  3000  3570  7150
0       7.610000e+09      C20    A     1     1     0     1     1
1       7.610000e+09      C20    B     1     1     0     1     1
2       7.610000e+09      C21    A     0     0     1     0     0
3       7.610000e+10      C14    B     1     0     0     1     1
4       7.610000e+10      C20    A     1     1     0     1     1

0
投票
df1.assign(col1=df1.groupby(df1.code.ne(df1.code.shift()).cumsum()).number.cumcount().add(1)).pivot_table(index=['case','constant','code'],columns='col1',values='number',fill_value=0).add_prefix('number').reset_index()

col1   case constant code  number1  number2  number3  number4  number5
0     761e7      C20    A     3570     2780     7150     2950        0
1     761e7      C20    B     3570     2780     7150     2950        0
2     761e7      C21    A     3000        0        0        0        0
3     761e8      C14    B     3570     2780     7150        0        0
4     761e8      C20    A        0     3570     2780     7150     2950
© www.soinside.com 2019 - 2024. All rights reserved.