从一个 df 列中提取单词，分配给另一列

Question

我的数据框中有两列：Pests 和 FieldComment。如果 Pests 的值列为“无”，那么我想在 FieldComment 列中搜索特定单词并覆盖 Pests 列中的内容。如果 FieldComment 列中没有找到任何单词，则 Pests 列可以保留为“无”。

示例：

pests_list = ['蜘蛛'，'啮齿动物'，'蚂蚁'，'蜜蜂']

害虫	现场评论
蜘蛛	已执行服务。
无	为报告的啮齿动物提供服务。

理想情况下，上面的内容会变成这样：

害虫	现场评论
蜘蛛	已执行服务。
啮齿动物	为报告的啮齿动物提供服务。

这是我到目前为止所尝试过的，但我不太明白：

for w in df['FieldComment'].str.split():
    for p in pests_list:
        if w.str.lower() == p.str.lower():
            df['Pests'] = p

我也尝试过：

df.loc[df['Pests'] == 'None', "Pests"] =  *[pest for pest in pest_list if pest in df['FieldComment']]

最后：

df.loc[df['Pests'] == 'None', "Pests"] = df.loc[df['Pests'] == 'None', "Pests"].apply(lambda x: pest for pest in pest_list if pest in df['FieldComment'] else 'None')

Answer 1

将害虫
```
list
```
转化为
```
set
```
。
用
```
set
```
中的单词创建一个
```
FieldComment
```
。
获取两个
```
set
```
的交集并填充
```
Pests
```
列（其中为空）。

pests_set = set([p.lower() for p in pests_list])

df.loc[df["Pests"].isna(), "Pests"] = df["FieldComment"].apply(
    lambda x: ", ".join(set(x.strip(".").split()).intersection(pests_set)).capitalize()
)

     Pests                             FieldComment
0  Spiders                       Performed service.
1  Rodents  Performed service for reported rodents.

Answer 2

一个可能的解决方案是迭代

pests_list

中的所有值并搜索字符串中的第一个匹配项。

pests_list = ["Spiders", "Rodents", "Ants", "Honey Bees"]

mask = df["Pests"].isna()

df.loc[mask, "Pests"] = [
    next((p for p in pests_list if p.lower() in c), None)
    for c in df.loc[mask, "FieldComment"].str.lower()
]
print(df)

打印：

     Pests                             FieldComment
0  Spiders                       Performed service.
1  Rodents  Performed service for reported rodents.
2     None              Nothing will be found here.

从一个 df 列中提取单词，分配给另一列

问题描述投票：0回答：2

2个回答

最新问题

从一个 df 列中提取单词，分配给另一列

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2