如何为 ngrams 应用 nltk.pos_tag()

问题描述 投票:0回答:1

我需要将

nltk.pos_tag()
与双字母组合一起使用,这是我的代码:

from nltk.util import ngrams
from collections import Counter
bigrams = list(ngrams(all_file_data, 2))
print(bigrams[:50])
print(Counter(bigrams).most_common(30))

输出为:

[('SUBDELAGATION', 'ON'), ('ON', 'AGENDA'), ('AGENDA', 'ITEM'), ('ITEM', '3'), ...]

如何获得 pos_tag 以及附图中的二元组频率结果?

python nltk pos-tagger
1个回答
0
投票

试试这个:

from nltk import pos_tag, word_tokenize

from nltk.util import ngrams
from collections import Counter

text = "hello world is a common sentence. A common sentence is foo bar. A foo bar is a common ice cream."
tagged_texts = pos_tag(word_tokenize(text))

counter = Counter(ngrams(tagged_texts, 2))

counter.most_common(3)

[出]:

[((('is', 'VBZ'), ('a', 'DT')), 2),
 ((('a', 'DT'), ('common', 'JJ')), 2),
 ((('common', 'JJ'), ('sentence', 'NN')), 2),
 ((('.', '.'), ('A', 'DT')), 2),
 ((('foo', 'JJ'), ('bar', 'NN')), 2),
 ((('hello', 'JJ'), ('world', 'NN')), 1),
 ((('world', 'NN'), ('is', 'VBZ')), 1),
 ((('sentence', 'NN'), ('.', '.')), 1),
 ((('A', 'DT'), ('common', 'JJ')), 1),
 ((('sentence', 'NN'), ('is', 'VBZ')), 1),
 ((('is', 'VBZ'), ('foo', 'JJ')), 1),
 ((('bar', 'NN'), ('.', '.')), 1),
 ((('A', 'DT'), ('foo', 'JJ')), 1),
 ((('bar', 'NN'), ('is', 'VBZ')), 1),
 ((('common', 'JJ'), ('ice', 'NN')), 1),
 ((('ice', 'NN'), ('cream', 'NN')), 1),
 ((('cream', 'NN'), ('.', '.')), 1)]
© www.soinside.com 2019 - 2024. All rights reserved.