我正在尝试重新创建数据框。以下是原始数据框:
df = pd.DataFrame([['January','Monday',0,1,20],['January','Monday',1,2,15],['January','Wednesday',0,1,35],['March','Monday',0,1,23],['March','Monday',1,2,50],['March','Monday',2,3,60] ,['April','Wednesday',0,1,75]],columns = ['Month','Day','Data1','Data2','Random'])
Month Day Data1 Data2 Random
0 January Monday 0 1 20
1 January Monday 1 2 15
2 January Wednesday 0 1 35
3 March Monday 0 1 23
4 March Monday 1 2 50
5 March Monday 2 3 60
6 April Wednesday 0 1 75
我的目标是实现以下结果:
Month Day 0 1 2
0 January Monday 1 2.0 NaN
1 January Monday 1 2.0 NaN
2 January Wednesday 1 NaN NaN
3 March Monday 1 2.0 3.0
我尝试如下使用pivot_table,但是它当然不起作用,因为pivot_table不允许索引重复,而且我还将拥有multiindex,这会在以后的过程中引起问题。
df1 = pd.pivot_table(df, values = 'Data2', index = ['Month','Day'], columns = ['Data1'])
Data1 0 1 2
Month Day
April Wednesday 1.0 NaN NaN
January Monday 1.0 2.0 NaN
Wednesday 1.0 NaN NaN
March Monday 1.0 2.0 3.0
还有其他方法可以得到我想要的结果吗?提前非常感谢。
您可以尝试使用unby的groupby:
df.groupby(['Month','Day','Data2'])['Data2'].first().unstack().reset_index()
输出:
Data2 Month Day 1 2 3
0 April Wednesday 1.0 NaN NaN
1 January Monday 1.0 2.0 NaN
2 January Wednesday 1.0 NaN NaN
3 March Monday 1.0 2.0 3.0