鉴于以下数据。
s = '{"PassengerId":{"0":1,"1":2,"2":3},"Survived":{"0":0,"1":1,"2":1},"Pclass":{"0":3,"1":1,"2":3}}'
df = pd.read_json(s)
它看起来是:
PassengerId Survived Pclass
0 1 0 3
1 2 1 1
2 3 1 3
假设它已经被熔化成
m = df.melt()
print(m)
variable value
0 PassengerId 1
1 PassengerId 2
2 PassengerId 3
3 Survived 0
4 Survived 1
5 Survived 1
6 Pclass 3
7 Pclass 1
8 Pclass 3
我想知道如何恢复已融化了的 m
归原 df
.
我试过类似下面的东西。
m=df.melt().pivot(columns='variable', values='value').reset_index(drop=True)
m.columns.name = None
结果是
PassengerId Pclass Survived
0 1.0 NaN NaN
1 2.0 NaN NaN
2 3.0 NaN NaN
3 NaN NaN 0.0
4 NaN NaN 1.0
5 NaN NaN 1.0
6 NaN 3.0 NaN
7 NaN 1.0 NaN
8 NaN 3.0 NaN
可以看到,每一行只包含一列的信息,里面有很多我想丢失的NaN值。
使用 GroupBy.cumcount
用于新列的 index
中的参数 DataFrame.pivot
:
m['new'] = m.groupby('variable').cumcount()
df = m.pivot(columns='variable', values='value', index='new')
print (df)
variable PassengerId Pclass Survived
new
0 1 3 0
1 2 1 1
2 3 3 1
或:
df = (m.assign(new = m.groupby('variable').cumcount())
.pivot(columns='variable', values='value', index='new'))
print (df)
variable PassengerId Pclass Survived
new
0 1 3 0
1 2 1 1
2 3 3 1