[我看过几篇文章,它们要么仅适用于一列的示例,或者仅适用于NaN或0值-但不能同时适用于这两个示例。
我的df看起来像这样。我想用在其右四列中找到的不丢失或非零的字符串填充“ Main”列。
当前df =
import pandas as pd
d = {'Main': ['','','',''], 'col2': ['Big','','',0], 'col3': [0,'Medium',0,''], 'col4': ['','','Small',''], 'col5':['',0,'','Vsmall']}
df = pd.DataFrame(data=d)
+------+------+--------+-------+--------+
| Main | Col2 | Col3 | Col4 | Col5 |
+------+------+--------+-------+--------+
| | Big | 0 | ... | |
+------+------+--------+-------+--------+
| | ... | Medium | ... | 0 |
+------+------+--------+-------+--------+
| | | 0 | Small | |
+------+------+--------+-------+--------+
| | 0 | ... | ... | Vsmall |
+------+------+--------+-------+--------+
所需输出df
+--------+------+--------+-------+--------+
| Main | Col2 | Col3 | Col4 | Col5 |
+--------+------+--------+-------+--------+
| Big | Big | 0 | ... | |
+--------+------+--------+-------+--------+
| Medium | ... | Medium | ... | 0 |
+--------+------+--------+-------+--------+
| Small | | 0 | Small | |
+--------+------+--------+-------+--------+
| Vsmall | 0 | ... | ... | Vsmall |
+--------+------+--------+-------+--------+
提前感谢!
想法是用0
替换DataFrame.mask
并将空字符串替换为缺少的值,然后回填缺失的行并最后选择第一列:
DataFrame.mask
[如果可能,创建所有可能提取的字符串的列表,请用c = ['col2','col3','col4','col5']
df['Main'] = df[c].mask(df.isin(['0','',0])).bfill(axis=1).iloc[:, 0]
print (df)
Main col1 col2 col3
0 Big Big None
1 Medium 0 Medium None
2 Small 0 Small
替换所有其他值:
DataFrame.where
详细信息:
DataFrame.where
[从您提供的样本数据中,我认为您想要实现的是对单热编码数据进行解码(一种在机器学习中将分类数据转换为数值数据的经典技术)。
这里是实现解码的代码:
['col2','col3','col4','col5']
df['Main'] = df[c].where(df.isin(['Big','Medium','Small','Vsmall'])).bfill(axis=1).iloc[:,0]
print (df)
Main col1 col2 col3
0 Big Big None
1 Medium 0 Medium None
2 Small 0 Small
注意:始终考虑在数据帧上使用归约(即print (df[c].mask(df.isin(['0','',0])))
#print (df[c].where(df.isin(['Big','Medium','Small','Vsmall'])))
col1 col2 col3
0 Big None NaN
1 NaN Medium None
2 NaN NaN Small
print (df[c].mask(df.isin(['0','',0])).bfill(axis=1))
col1 col2 col3
0 Big NaN NaN
1 Medium Medium None
2 Small Small Small
),而不是遍历行。