NLTK将动词标识为祈使语中的名词

Question

我正在如下使用NLTK POS标记器

sent1='get me now'
sent2='run fast'
tags=pos_tag(word_tokenize(sent2))
print tags
[('run', 'NN'), ('fast', 'VBD')]

[我发现类似的文章NLTK Thinks that Imperatives are Nouns，建议将单词作为动词添加到字典中。问题是我有太多这样的未知词。但是我有一个线索，它们总是出现在词组的开头。

例如：“立即下载”，“立即预订”，“注册”

我如何正确协助NLTK产生正确的结果

Answer 1

您可以在NLTK中加载其他第三方模型。看看Python NLTK pos_tag not returning the correct part-of-speech tag

要用一些技巧回答问题，您可以通过添加代词来欺骗POS标记，以便使动词获得主语，例如

>>> from nltk import pos_tag
>>> sent1 = 'get me now'.split()
>>> sent2 = 'run fast'.split()
>>> pos_tag(['He'] + sent1)
[('He', 'PRP'), ('get', 'VBD'), ('me', 'PRP'), ('now', 'RB')]
>>> pos_tag(['He'] + sent1)[1:]
[('get', 'VBD'), ('me', 'PRP'), ('now', 'RB')]

使答案实用化：

>>> from nltk import pos_tag
>>> sent1 = 'get me now'.split()
>>> sent2 = 'run fast'.split()
>>> def imperative_pos_tag(sent):
...     return pos_tag(['He']+sent)[1:]
... 
>>> imperative_pos_tag(sent1)
[('get', 'VBD'), ('me', 'PRP'), ('now', 'RB')]
>>> imperative_pos_tag(sent2)
[('run', 'VBP'), ('fast', 'RB')]

如果您希望命令中的所有动词都接收基本形式的VB标签：

>>> from nltk import pos_tag
>>> sent1 = 'get me now'.split()
>>> sent2 = 'run fast'.split()
>>> def imperative_pos_tag(sent):
...     return [(word, tag[:2]) if tag.startswith('VB') else (word,tag) for word, tag in pos_tag(['He']+sent)[1:]]
... 
>>> imperative_pos_tag(sent1)
[('get', 'VB'), ('me', 'PRP'), ('now', 'RB')]
>>> imperative_pos_tag(sent2)
[('run', 'VB'), ('fast', 'RB')]

Answer 2

在这里https://spacy.io/usage/linguistic-features#pos-tagging找到了一个名为spaCy的新库，它很好用，

import spacy
nlp = spacy.load("en_core_web_sm")
text = ("run fast")
doc = nlp(text)
verbs = [(token, token.pos_) for token in doc]
print(verbs)

输出：

[(run, 'VERB'), (fast, 'ADV')]

安装指南：https://spacy.io/usage

NLTK将动词标识为祈使语中的名词

问题描述投票：1回答：2

2个回答

最新问题

NLTK将动词标识为祈使语中的名词

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2