在另一个字符串列中搜索一个字符串列值

问题描述 投票:0回答:1

我有两个名为 A 和 B 的数据框。在数据框 A 中,我有一列名为 Comments 的列,在数据框 B 中,我有一列名为 Solution 的列。

下面是 df_A 和 df_B 中两列的数据

df_A = pd.DataFrame({'Comments':
    """
Repaired loose connection  
No ice or water dispensing
No lights on the control panel"""},
index=[0])

df_B = pd.DataFrame({'Solution': ['A, B & C : Control panel not working: loose electrical connector',
    'D,E & F: Not cooling : loose electrical connector']})

这里,需要做的是,我需要一个代码,它读取注释栏中的每个单词,并在解决方案栏中搜索该值,并根据 df_B 中的匹配解决方案填充 df_A 中的“答案”栏。

输出:

Comments:
Repaired loose connection no ice or
Water dispensing, no lights on the control panel
Answer:
A, B & C : control panel not working: loose electrical connector.

这就是我想要的输出。

下面的代码是我尝试过但没有得到任何结果。

for index, row in df_B.iterrows():
    found=df_A[‘Comments’].str.contains (row[‘Solution’],case=False)
    df_A.loc[found,’Answer’] =row[‘Solution’]
python pandas dataframe search text
1个回答
0
投票

@Kinnuu,当我尝试了解你想要做什么时,我可以想出一个可能的解决方案:

import pandas as pd
import itertools

df_A = pd.DataFrame({'Comments':
"""Repaired loose connection
No ice or water dispensing
No lights on the control panel"""},
index=[0])

df_B = pd.DataFrame({'Solution': ['A, B & C : Control panel not working: loose electrical connector',
'D,E & F: Not cooling : loose electrical connector']})
# take all the unique words from the comments
words = set(itertools.chain.from_iterable(map(str.split,
                                          df_A.loc[0, "Comments"].split("\n"))))

scores = []
# for each row keep track of the index and the total number of matching words
for index, row in df_B.iterrows():
    # use split to make sure the match is on full words and lower to match on lowercase
    words_in_row = list(map(str.lower,
                        row["Solution"].split(" ")))
    scores.append((index,
               len([word for word in words if word.lower() in words_in_row])))

# get the highest score by matched on the greatest length
high_score = max(scores, key=lambda x:x[-1])
# put the solution in as answer
df_A["Answer"] = df_B.iloc[high_score[0]].values

这现在仅适用于 df_A 中的一行。不过,将其变成一个函数应该不会太麻烦。

© www.soinside.com 2019 - 2024. All rights reserved.