根据文本是否存在为列分配值

Question

我有一个数据框：

水果	颜色
苹果	“是绿色的”
苹果	“又大又绿”
苹果	“是红色的”
香蕉	‘黄色’
香蕉	“是黄色的”
香蕉	'又小又黄”

我想为所有 df['Fruit'] == 'apple' 创建第三列，称为 'Green' ，它是二进制的，并且包含字符串 'green' 的列 'Color' 中的每个值都是 1 '，对于所有没有的值，为 0，对于每个值 df['Fruit'] == 'Banana'，df['Green] == NA。

水果	颜色	绿色
苹果	“是绿色的”	1
苹果	“又大又绿”	1
苹果	“是红色的”	0
香蕉	‘黄色’	不适用
香蕉	“是黄色的”	不适用
香蕉	'又小又黄”	不适用

运行此命令时出现错误：

df['绿色'] = np.where((df['水果'] == '苹果') & (df['颜色].str.lower().str.contains('绿色') == True ), 1, 0)

Answer 1

创建布尔掩码，然后使用带有 loc 的布尔索引来分配值

m1 = df['Fruit'].eq('apple')
m2 = df['Color'].str.contains('(?i)green')
df.loc[m1, 'Green'] = (m1 & m2).astype('int')

    Fruit                  Color  Green
0   apple             'is green'    1.0
1   apple        'big and Green'    1.0
2   apple               'is red'    0.0
3  banana               'yellow"    NaN
4  banana            'is yellow'    NaN
5  banana  'is small and yellow"    NaN

根据文本是否存在为列分配值

问题描述投票：0回答：1

1个回答

最新问题

根据文本是否存在为列分配值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1