给出以下数据:
data = {'Org': ['A', 'B', 'C', 'D','E','F',
'Tom': [NaN, 1, 1, 1, NaN, NaN],
'Kelly': [1, 1, 1, 1, NaN, 1],
'Rick': [1, 1, 1, 1, 1, 1],
'Dave': [1, NaN, 1, NaN, 1, NaN],
'Sara': [1, 1, 1, 1, 0, 1],
'Liz': [NaN, 1, 1, 1, NaN, 1]}
df = pd.DataFrame(数据)
我想对除前两列之外的列进行求和,然后将非 NaN 的值替换为列结果的总和:
结果应该是这样的:
data = {'Org': ['A', 'B', 'C', 'D','E','F',
'Tom': [NaN, 5, 6, 5, NaN, NaN],
'Kelly': [1, 5, 6, 5, NaN, 4],
'Rick': [1, 5, 6, 5, 2, 4],
'Dave': [1, NaN, 6, NaN, 2, NaN],
'Sara': [1, 5, 6, 5, NaN, 4],
'Liz': [NaN, 5, 6, 5, NaN, 4]}
我尝试过:
column_sums = df.iloc[:, 2:].sum()
for column in iloc[:, 2:].columns:
df[column] = column_sums[column]
但这取代了我的所有值,组合名称并在空间上取代了我的 NaN。
有可能有一个顺利的解决方案吗?
谢谢
构建为掩模,
sum
并就地修改:
df = pd.DataFrame(data)
m = df.iloc[:, 1:].notna()
df[m] = np.repeat(m.sum(axis=1).to_numpy()[:, None],
df.shape[1], axis=1)
输出:
Org Tom Kelly Rick Dave Sara Liz
0 A NaN 4.0 4 4.0 4 NaN
1 B 5.0 5.0 5 NaN 5 5.0
2 C 6.0 6.0 6 6.0 6 6.0
3 D 5.0 5.0 5 NaN 5 5.0
4 E NaN NaN 3 3.0 3 NaN
5 F NaN 4.0 4 NaN 4 4.0