python3 nltk,WordNetLemmatizer发生错误[重复]

问题描述 投票:1回答:1

这个问题在这里已有答案:

我查看了这本书并制作了书中的代码。顺便说一句,我有以下错误。我该怎么办?

from nltk.stem import PorterStemmer, WordNetLemmatizer

sent = 'The laughs you two heard were triggered by memories 
            of his own high j-flying exits for moving beasts'

lemmatizer = WordNetLemmatizer()
words = lemmatizer.lemmatize(sent, pos = 'pos')

File "D:/machine_learning/nltk_mapper.py", line 24, in <module>
    word = lemmatizer.lemmatize(words, pos='pos')
  File "D:\machine_learning\venv\lib\site-packages\nltk\stem\wordnet.py", line 40, in lemmatize
    lemmas = wordnet._morphy(word, pos)
  File "D:\machine_learning\venv\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1818, in _morphy
    exceptions = self._exception_map[pos]
KeyError: 'pos'

原始结果值是仅打印有意义的单词,如下所示:

  ['The', 'laugh', 'two', 'hear', 'trigger', 
   'memory', 'high', 'fly', 'exit', 'move', 'beast']

谢谢


我已经解决了。我引用了以下网址。 NLTK: lemmatizer and pos_tag

from nltk.tag import pos_tag
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
def lemmatize_all(sentence):
    wnl = WordNetLemmatizer()
    for word, tag in pos_tag(word_tokenize(sentence)):
        if tag.startswith("NN"):
            yield wnl.lemmatize(word, pos='n')
        elif tag.startswith('VB'):
            yield wnl.lemmatize(word, pos='v')
        elif tag.startswith('JJ'):
            yield wnl.lemmatize(word, pos='a')
        # else:
        #     yield word

print(' '.join(lemmatize_all('The laughs you two heard were triggered by memories of his own high j-flying exits for moving beasts')))

结果 - >笑听到触发记忆自己的高j飞出口移动野兽

谢谢

python-3.x nltk pos-tagger
1个回答
1
投票

Lemmatisation的目的是将一个单词的不同变形形式组合在一起,称为引理。例如,一个lemmatiser应该映射消失,进入和进入。因此,我们必须分别对每个单词进行词形解释。

from nltk.stem import PorterStemmer, WordNetLemmatizer

sent = 'The laughs you two heard were triggered by memories of his own high j-flying exits for moving beasts'
sent_tokenized = sent.split(" ")
lemmatizer = WordNetLemmatizer()
words = [lemmatizer.lemmatize(word) for word in sent_tokenized]
© www.soinside.com 2019 - 2024. All rights reserved.