查找在Python（可能多字）短语句子里面

Question

我想一个句子，其中的关键字通常是单个单词中找到的关键词，但也可以是多字连击（如“成本欧元”）。所以，如果我有一句话cost in euros of bacon，将在这句话找到cost in euros并返回true。

对于这一点，我用这个代码：

if any(phrase in line for phrase in keyword['aliases']:

其中line是输入和aliases是匹配的关键字短语的阵列（如在欧元成本，这是['cost in euros', 'euros', 'euro cost']）。

然而，我注意到，它也触发字部位。例如，我有y的比赛短语和trippy cake的句子。我想到这个返回true，但这样做，因为它显然发现在y的trippy。我如何得到这个只检查整个单词？本来我是做文字（主要是做line.split()并检查那些）的列表，用这个关键字搜索，但这并不多字的关键字别名工作。

Answer 1

这应该做到你在找什么：

import re

aliases = [
    'cost.',
    '.cost',
    '.cost.',
    'cost in euros of bacon',
    'rocking euros today',
    'there is a cost inherent to bacon',
    'europe has cost in place',
    'there is a cost.',
    'I was accosted.',
    'dealing with euro costing is painful']
phrases = ['cost in euros', 'euros', 'euro cost', 'cost']

matched = list(set([
    alias
    for alias in aliases
    for phrase in phrases
    if re.search(r'\b{}\b'.format(phrase), alias)
    ]))

print(matched)

输出：

['there is a cost inherent to bacon', '.cost.', 'rocking euros today', 'there is a cost.', 'cost in euros of bacon', 'europe has cost in place', 'cost.', '.cost']

基本上，我们抢占了所有的比赛，用蟒蛇re模块作为我们的测试，包括在多个phrases发生在一个给定的alias情况下，使用复合list comprehension，然后用set()从list修剪重复，然后使用list()要挟set回一个list。

参考文献：

清单：https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

列表解析：https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions

集：https://docs.python.org/3/tutorial/datastructures.html#sets

重（或正则表达式）：https://docs.python.org/3/library/re.html#module-re

查找在Python（可能多字）短语句子里面

问题描述投票：2回答：1

1个回答

最新问题

查找在Python（可能多字）短语句子里面

问题描述 投票：2回答：1

1个回答

最新问题

问题描述投票：2回答：1