Python / Pandas:如果值是NaN或0,则用同一行中下一列的值填充

问题描述 投票:3回答:2

[我看过几篇文章,它们要么仅适用于一列的示例,或者仅适用于NaN或0值-但不能同时适用于这两个示例。

我的df看起来像这样。我想用在其右四列中找到的不丢失或非零的字符串填充“ Main”列。

当前df =

import pandas as pd

d = {'Main': ['','','',''], 'col2': ['Big','','',0], 'col3': [0,'Medium',0,''], 'col4': ['','','Small',''], 'col5':['',0,'','Vsmall']}
df = pd.DataFrame(data=d)

+------+------+--------+-------+--------+
| Main | Col2 | Col3   | Col4  | Col5   |
+------+------+--------+-------+--------+
|      | Big  | 0      | ...   |        |
+------+------+--------+-------+--------+
|      | ...  | Medium | ...   | 0      |
+------+------+--------+-------+--------+
|      |      | 0      | Small |        |
+------+------+--------+-------+--------+
|      | 0    | ...    | ...   | Vsmall |
+------+------+--------+-------+--------+

所需输出df

+--------+------+--------+-------+--------+
| Main   | Col2 | Col3   | Col4  | Col5   |
+--------+------+--------+-------+--------+
| Big    | Big  | 0      | ...   |        |
+--------+------+--------+-------+--------+
| Medium | ...  | Medium | ...   | 0      |
+--------+------+--------+-------+--------+
| Small  |      | 0      | Small |        |
+--------+------+--------+-------+--------+
| Vsmall | 0    | ...    | ...   | Vsmall |
+--------+------+--------+-------+--------+

提前感谢!

python pandas dataframe fill
2个回答
2
投票

想法是用0替换DataFrame.mask并将空字符串替换为缺少的值,然后回填缺失的行并最后选择第一列:

DataFrame.mask

[如果可能,创建所有可能提取的字符串的列表,请用c = ['col2','col3','col4','col5'] df['Main'] = df[c].mask(df.isin(['0','',0])).bfill(axis=1).iloc[:, 0] print (df) Main col1 col2 col3 0 Big Big None 1 Medium 0 Medium None 2 Small 0 Small 替换所有其他值:

DataFrame.where

详细信息

DataFrame.where

0
投票

[从您提供的样本数据中,我认为您想要实现的是对单热编码数据进行解码(一种在机器学习中将分类数据转换为数值数据的经典技术)。

这里是实现解码的代码:

['col2','col3','col4','col5']
df['Main'] = df[c].where(df.isin(['Big','Medium','Small','Vsmall'])).bfill(axis=1).iloc[:,0]
print (df)
     Main col1    col2   col3
0     Big  Big    None       
1  Medium    0  Medium   None
2   Small            0  Small

注意:始终考虑在数据帧上使用归约(即print (df[c].mask(df.isin(['0','',0]))) #print (df[c].where(df.isin(['Big','Medium','Small','Vsmall']))) col1 col2 col3 0 Big None NaN 1 NaN Medium None 2 NaN NaN Small print (df[c].mask(df.isin(['0','',0])).bfill(axis=1)) col1 col2 col3 0 Big NaN NaN 1 Medium Medium None 2 Small Small Small ),而不是遍历行。

© www.soinside.com 2019 - 2024. All rights reserved.