我想计算单词列表在一列中出现的次数。这是我的数据框:
original people result
John is a good friend John, Mary 1
Mary and Peter are going to marry Peter, Mary 2
Bond just met the Bond girl Bond 2
Chris is having dinner NaN 0
All Marys are here Mary 0
我尝试使用此处建议的代码检查某列是否包含 pandas 数据框中另一列的单词:
import pandas as pd
import re
df['result'] = [', '.join([p for p in po
if re.search(f'\\b{p}\\b', o)) ]
for o, po in zip(df.original, df.people.str.split(',\o*'))
]
# And after I would try to calculate the number of words in column 'result'
但随后我收到以下消息:
error: bad escape \o at position 1
有人可以提出建议吗?
In [39]: df = pd.DataFrame({'original':["John is a good friend", "Mary and Peter are going to marry", "Bond just met the Bond girl", "Chris is having dinner", "All Marys are here"], "people": ["John, Mary", "Peter, Mary", "Bond", '', "Mary"]})
In [40]: df
Out[40]:
original people
0 John is a good friend John, Mary
1 Mary and Peter are going to marry Peter, Mary
2 Bond just met the Bond girl Bond
3 Chris is having dinner
4 All Marys are here Mary
In [41]: df['result'] = df.apply(lambda row: sum((row['original'].count(p.strip()) for p in row['people'].split(',') if p), start=0), axis=1)
In [42]: df
Out[42]:
original people result
0 John is a good friend John, Mary 1
1 Mary and Peter are going to marry Peter, Mary 2
2 Bond just met the Bond girl Bond 2
3 Chris is having dinner 0
4 All Marys are here Mary 1