如何对句子列表进行词形推理

问题描述 投票:0回答:2

我怎样才能将Python中的句子列表变为lematize?

from nltk.stem.wordnet import WordNetLemmatizer
a = ['i like cars', 'cats are the best']
lmtzr = WordNetLemmatizer()
lemmatized = [lmtzr.lemmatize(word) for word in a]
print(lemmatized)

这是我尝试过但它给了我相同的句子。我需要先将这些单词标记为正常工作吗?

python list nltk lemmatization
2个回答
2
投票

TL; DR:

pip3 install -U pywsd

然后:

>>> from pywsd.utils import lemmatize_sentence

>>> text = 'i like cars'
>>> lemmatize_sentence(text)
['i', 'like', 'car']
>>> lemmatize_sentence(text, keepWordPOS=True)
(['i', 'like', 'cars'], ['i', 'like', 'car'], ['n', 'v', 'n'])

>>> text = 'The cat likes cars'
>>> lemmatize_sentence(text, keepWordPOS=True)
(['The', 'cat', 'likes', 'cars'], ['the', 'cat', 'like', 'car'], [None, 'n', 'v', 'n'])

>>> text = 'The lazy brown fox jumps, and the cat likes cars.'
>>> lemmatize_sentence(text)
['the', 'lazy', 'brown', 'fox', 'jump', ',', 'and', 'the', 'cat', 'like', 'car', '.']

否则,看看pywsd中的函数如何:

  • 对字符串进行标记
  • 使用POS标记器并映射到WordNet POS标记集
  • 试图阻止
  • 最后用POS和/或茎调用变形器

https://github.com/alvations/pywsd/blob/master/pywsd/utils.py#L129


1
投票

你必须分别对每个单词进行词形变换。相反,你将句子说出来。正确的代码片段:

from nltk.stem.wordnet import WordNetLemmatizer
from nltk import word_tokenize
sents = ['i like cars', 'cats are the best']
lmtzr = WordNetLemmatizer()
lemmatized = [[lmtzr.lemmatize(word) for word in word_tokenize(s)]
              for s in sents]
print(lemmatized)
#[['i', 'like', 'car'], ['cat', 'are', 'the', 'best']]

如果您先进行POS标记,然后将POS信息提供给变形器,您也可以获得更好的结果。

© www.soinside.com 2019 - 2024. All rights reserved.