Python计算两列之间匹配的单词数

问题描述 投票:0回答:1

我想计算单词列表在一列中出现的次数。这是我的数据框:

original                           people       result
John is a good friend              John, Mary   1
Mary and Peter are going to marry  Peter, Mary  2
Bond just met the Bond girl        Bond         2
Chris is having dinner             NaN          0
All Marys are here                 Mary         0

我尝试使用此处建议的代码检查某列是否包含 pandas 数据框中另一列的单词

import pandas as pd
import re
df['result'] = [', '.join([p for p in po 
                     if re.search(f'\\b{p}\\b', o)) ]
                for o, po in zip(df.original, df.people.str.split(',\o*'))
             ]
# And after I would try to calculate the number of words in column 'result'

但随后我收到以下消息:

error: bad escape \o at position 1

有人可以提出建议吗?

python pandas dataframe text
1个回答
0
投票
In [39]: df = pd.DataFrame({'original':["John is a good friend", "Mary and Peter are going to marry", "Bond just met the Bond girl", "Chris is having dinner", "All Marys are here"], "people": ["John, Mary", "Peter, Mary", "Bond", '', "Mary"]})

In [40]: df
Out[40]:
                            original       people
0              John is a good friend   John, Mary
1  Mary and Peter are going to marry  Peter, Mary
2        Bond just met the Bond girl         Bond
3             Chris is having dinner
4                 All Marys are here         Mary

In [41]: df['result'] = df.apply(lambda row: sum((row['original'].count(p.strip()) for p in row['people'].split(',') if p), start=0), axis=1)

In [42]: df
Out[42]:
                            original       people  result
0              John is a good friend   John, Mary       1
1  Mary and Peter are going to marry  Peter, Mary       2
2        Bond just met the Bond girl         Bond       2
3             Chris is having dinner                    0
4                 All Marys are here         Mary       1
© www.soinside.com 2019 - 2024. All rights reserved.