使用子字符串动态更新 Python DataFrame 中的列

问题描述 投票:0回答:2

假设我有一个数据框,我们称之为 A

关键字_词 代码
市场 A1
剧院 A2

我还有另一个数据框,我们称之为 B

句子 来自_A的组件
约翰去了剧院
玛丽去了市场,然后去了剧院

我想找到一种动态更新数据帧B的方法,使其循环遍历每个b.sentence,如果找到A.Key_word的实例,则记录A.Code。如果某个关键字在A中出现多次,我希望它在 B.Components_from_A 中只出现一次。如果出现多个关键字我希望将其记录为Code1+Code2+Code3.....

句子 来自_A的组件
约翰去了剧院 A2
玛丽去了市场,然后去了剧院 A1+A2
杰克决定现在就去剧院,而不是稍后再去剧院 A2

我认为我可以构建类似的东西

    import pandas as pd
    import numpy as np

    A=pd.read_csv('A.csv')
    B=pd.read_csv('B.csv')
    Dict=dict(zip(A.Key_Word,A.Code))

我正在努力解决的是动态更新 B 中的列以及我需要按照我描述的方式进行操作的逻辑

python pandas dataframe dictionary
2个回答
0
投票

您可以通过创建数据帧 A 的字典并迭代 B 的每个句子来完成此操作。检查是否是重要的关键字并收集信息

import pandas as pd

data_A = {'Key_Word': ['Market', 'Theater'], 'Code': ['A1', 'A2']}
data_B = {'Sentence': [
    'John went to the theater',
    'Mary went to the Market and then the theater',
    'Jack decided to go to the theater now as opposed to going to the theater later'
]}

A = pd.DataFrame(data_A)
B = pd.DataFrame(data_B)

keyword_to_code = dict(zip(A.Key_Word, A.Code))

def update_components(sentence, keyword_to_code):
    codes = set()  
    for keyword, code in keyword_to_code.items():
        if keyword.lower() in sentence.lower():
            codes.add(code)
    return '+'.join(sorted(codes))

B['Components_from_A'] = B['Sentence'].apply(lambda x: update_components(x, keyword_to_code))

print(B)

这给了你

import pandas as pd

data_A = {'Key_Word': ['Market', 'Theater'], 'Code': ['A1', 'A2']}
data_B = {'Sentence': [
    'John went to the theater',
    'Mary went to the Market and then the theater',
    'Jack decided to go to the theater now as opposed to going to the theater later'
]}

A = pd.DataFrame(data_A)
B = pd.DataFrame(data_B)

keyword_to_code = dict(zip(A.Key_Word, A.Code))

def update_components(sentence, keyword_to_code):
    codes = set()  
    for keyword, code in keyword_to_code.items():
        if keyword.lower() in sentence.lower():
            codes.add(code)
    return '+'.join(sorted(codes))

B['Components_from_A'] = B['Sentence'].apply(lambda x: update_components(x, keyword_to_code))

print(B)


0
投票

您可以制作一个正则表达式,

extractall
关键字,
merge
,然后
drop_duplicates
groupby.agg

import re

A['Key_Word'] = A['Key_Word'].str.casefold()

pattern = '(%s)' % '|'.join(map(re.escape, A['Key_Word']))
# '(Market|Theater)'

B['Components_from_A'] = (B['Sentence'].str.casefold().str.extractall(pattern)
 .droplevel('match').reset_index(0)
 .merge(A, left_on=0, right_on='Key_Word')
 .drop_duplicates(['index', 'Code'])
 .groupby('index')['Code'].agg('+'.join)
)

输出:

                                            Sentence Components_from_A
0                           John went to the theater                A2
1       Mary went to the Market and then the theater             A1+A2
2  Jack decided to go to the theater now as oppos...                A2
© www.soinside.com 2019 - 2024. All rights reserved.