我想提取df1中所有与df2匹配的单词。
df1 = pd.DataFrame(['Dog has 4 legs.It has 2 eyes.','Fish has fins','Cat has paws.It eats fish','Monkey has tail'],columns=['Description'])
df2 = pd.DataFrame(['Fish','Legs','Eyes'],columns=['Parts'])
Df1 Df2
|---------------------------------| |---------------------------------|
| **Description** | | Parts |
|---------------------------------| |---------------------------------|
| Dog has 4 legs.It has 2 eyes. | | Fish |
|---------------------------------| |---------------------------------|
| Fish has fins | | Legs |
|---------------------------------| |---------------------------------|
| Cat has paws.It eats fish. | | Tail |
|---------------------------------| |---------------------------------|
希望的输出。
|---------------------------------|-----------|
| **Description** |Parts |
|---------------------------------|-----------|
| Dog has 4 legs.It has 2 eyes. |Legs,Tail |
|---------------------------------|-----------|
| Fish has fins |Fish |
|---------------------------------|-----------|
| Cat has paws.It eats fish. |Fish |
|---------------------------------|-----------|
| Monkey has tail | |
|---------------------------------|-----------|
IIUC str.extractall
来收集所有火柴,然后 groupby
的索引来创建一个列表或聚合。
import re
pat = '|'.join(df2['Parts'].tolist())
#Fish|Legs|Eyes
df1['Parts'] = df1['Description'].str.extractall(f"({pat})"
,flags=re.IGNORECASE)\
.groupby(level=0)[0].agg(','.join)
print(df1)
Description Parts
0 Dog has 4 legs.It has 2 eyes. legs,eyes
1 Fish has fins Fish
2 Cat has paws.It eats fish fish
3 Monkey has tail NaN
@Datanovice的解决方案更好,因为所有的东西都在Pandas里。这是一个替代方案,而且速度更快(字符串操作在Pandas中不是那么快)。
from itertools import product
from collections import defaultdict
res = df2.Parts.str.lower().array
d = defaultdict(list)
for description, word in product(df1.Description, res):
if word in description.lower():
d[description].append(word)
d
defaultdict(list,
{'Dog has 4 legs.It has 2 eyes.': ['legs', 'eyes'],
'Fish has fins': ['fish'],
'Cat has paws.It eats fish': ['fish']})
df1['parts'] = df1.Description.map(d).str.join(',')
Description parts
0 Dog has 4 legs.It has 2 eyes. legs,eyes
1 Fish has fins fish
2 Cat has paws.It eats fish fish
3 Monkey has tail