使用NLTK
字母组合标记,我正在训练Brown Corpus
中的句子>
我尝试使用不同的categories
,但得到的值大致相同。对于每个0.9328
,例如categories
,fiction
或romance
,该值约为humor
...>
from nltk.corpus import brown # Fiction brown_tagged_sents = brown.tagged_sents(categories='fiction') brown_sents = brown.sents(categories='fiction') unigram_tagger = nltk.UnigramTagger(brown_tagged_sents) unigram_tagger.evaluate(brown_tagged_sents) >>> 0.9415956079897209 # Romance brown_tagged_sents = brown.tagged_sents(categories='romance') brown_sents = brown.sents(categories='romance') unigram_tagger = nltk.UnigramTagger(brown_tagged_sents) unigram_tagger.evaluate(brown_tagged_sents) >>> 0.9348490474422324
为什么会这样?是因为它们来自同一
corpus
吗?还是他们的part-of-speech
标记相同?
[使用NLTK Unigram Tagger,我正在用Brown Corpus训练句子,尝试不同的类别,我得到的价值大致相同。对于每个类别,例如小说,...