输入数据框:
case constant number code
761e7 C20 3570 A
761e7 C20 2780 A
761e7 C20 7150 A
761e7 C20 2950 A
761e7 C20 3570 B
761e7 C20 2780 B
761e7 C20 7150 B
761e7 C20 2950 B
761e7 C21 3000 A
761e8 C20 3570 A
761e8 C20 2780 A
761e8 C20 7150 A
761e8 C20 2950 A
761e8 C14 3570 B
761e8 C14 2780 B
761e8 C14 7150 B
尝试将重复的数字列转换为基于其他列的多列。
Pandas 数据透视转换给了我一个 ValueError,如图所示
df = final_df.pivot(index='case', columns='number')
ValueError: Index contains duplicate entries, cannot reshape
预期输出:
case constant code number1 number2 number3 number4 number5
761e7 C20 A 3570 2780 7150 2950 0
761e7 C21 A 0 0 0 0 3000
761e7 C20 B 3570 2780 7150 2950 0
761e8 C20 A 3570 2780 7150 2950 0
761e8 C14 B 3570 2780 7150 0 0
IIUC,尝试:
g = df.groupby(['case','constant','code'])
df_out = df.set_index(['case','constant','code',g.cumcount()+1]).unstack(fill_value=0)
df_out.columns = [f'{i}{j}' for i, j in df_out.columns]
df_out.reset_index()
输出:
case constant code number1 number2 number3 number4
0 761e7 C20 A 3570 2780 7150 2950
1 761e7 C20 B 3570 2780 7150 2950
2 761e7 C21 A 3000 0 0 0
3 761e8 C14 B 3570 2780 7150 0
4 761e8 C20 A 3570 2780 7150 2950
更常见的方法是让列名称为数值,行包含计数 - 例如:
df.pivot_table(index=['case','constant','code'],
columns='number', aggfunc=len).reset_index()
屈服
number case constant code 2780 2950 3000 3570 7150
0 7.610000e+09 C20 A 1 1 0 1 1
1 7.610000e+09 C20 B 1 1 0 1 1
2 7.610000e+09 C21 A 0 0 1 0 0
3 7.610000e+10 C14 B 1 0 0 1 1
4 7.610000e+10 C20 A 1 1 0 1 1
df1.assign(col1=df1.groupby(df1.code.ne(df1.code.shift()).cumsum()).number.cumcount().add(1)).pivot_table(index=['case','constant','code'],columns='col1',values='number',fill_value=0).add_prefix('number').reset_index()
col1 case constant code number1 number2 number3 number4 number5
0 761e7 C20 A 3570 2780 7150 2950 0
1 761e7 C20 B 3570 2780 7150 2950 0
2 761e7 C21 A 3000 0 0 0 0
3 761e8 C14 B 3570 2780 7150 0 0
4 761e8 C20 A 0 3570 2780 7150 2950