Spacy删除停用词而不影响命名实体

Question

我正在尝试从字符串中删除停用词，但我要实现的条件是不应删除字符串中的命名实体。

import spacy
nlp = spacy.load('en_core_web_sm')
text = "The Bank of Australia has an agreement according to the Letter Of Offer which states that the deduction should be made at the last date of each month"
doc = nlp(text)

如果我检查文本中的命名实体，则会得到以下内容

print(doc.ents)
(The Bank of Australia, the Letter Of Offer, the last date of each month)

删除停用词的常用方法如下所示>>

[token.text for token in doc if not token.is_stop] ['Bank', 'Australia', 'agreement', 'according', 'Letter', 'Offer', 'states', 'deduction', 'date', 'month']

常规方法完全消除了我的任务所需的含义。我想保留命名实体。

我尝试添加具有相同列表的命名实体。

list1 = [token.text for token in doc if not token.is_stop]
list2 = [str(a) for a in doc.ents]

list1 + list2

['Bank',
 'Australia',
 'agreement',
 'according',
 'Letter',
 'Offer',
 'states',
 'deduction',
 'date',
 'month',
 'The Bank of Australia',
 'the Letter Of Offer',
 'the last date of each month']
还有其他方法吗？

我正在尝试从字符串中删除停用词，但我要达到的条件是不应删除字符串中的命名实体。导入spacy nlp = spacy.load（'en_core_web_sm'）text = ...

Answer 1

您可以使用token.ent_iob_或token.ent_type_在API documentation上检查令牌级别是否是实体的一部分。因此，您可能想要这样的东西：

Spacy删除停用词而不影响命名实体

问题描述投票：1回答：1

1个回答

最新问题

Spacy删除停用词而不影响命名实体

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1