解除pandas数据框架,删除NaN[重复]。

问题描述 投票:0回答:1

鉴于以下数据。

s = '{"PassengerId":{"0":1,"1":2,"2":3},"Survived":{"0":0,"1":1,"2":1},"Pclass":{"0":3,"1":1,"2":3}}'
df = pd.read_json(s)

它看起来是:

   PassengerId  Survived  Pclass
0            1         0       3
1            2         1       1
2            3         1       3

假设它已经被熔化成

m = df.melt()
print(m)

      variable  value
0  PassengerId      1
1  PassengerId      2
2  PassengerId      3
3     Survived      0
4     Survived      1
5     Survived      1
6       Pclass      3
7       Pclass      1
8       Pclass      3

我想知道如何恢复已融化了的 m 归原 df.

我试过类似下面的东西。

m=df.melt().pivot(columns='variable', values='value').reset_index(drop=True)
m.columns.name = None

结果是

   PassengerId  Pclass  Survived
0          1.0     NaN       NaN
1          2.0     NaN       NaN
2          3.0     NaN       NaN
3          NaN     NaN       0.0
4          NaN     NaN       1.0
5          NaN     NaN       1.0
6          NaN     3.0       NaN
7          NaN     1.0       NaN
8          NaN     3.0       NaN
​

可以看到,每一行只包含一列的信息,里面有很多我想丢失的NaN值。

python pandas dataframe data-manipulation
1个回答
3
投票

使用 GroupBy.cumcount 用于新列的 index 中的参数 DataFrame.pivot:

m['new'] = m.groupby('variable').cumcount()

df = m.pivot(columns='variable', values='value', index='new')
print (df)

variable  PassengerId  Pclass  Survived
new                                    
0                   1       3         0
1                   2       1         1
2                   3       3         1

或:

df = (m.assign(new = m.groupby('variable').cumcount())
       .pivot(columns='variable', values='value', index='new'))
print (df)

variable  PassengerId  Pclass  Survived
new                                    
0                   1       3         0
1                   2       1         1
2                   3       3         1
© www.soinside.com 2019 - 2024. All rights reserved.