我想编写一个函数,该函数返回给定文本的n元语法中每个元素的频率。请帮助。我对2克计数频率做了此代码
代码:
from nltk import FreqDist
from nltk.util import ngrams
def compute_freq():
textfile = "please write a function"
bigramfdist = FreqDist()
threeramfdist = FreqDist()
for line in textfile:
if len(line) > 1:
tokens = line.strip().split(' ')
bigrams = ngrams(tokens, 2)
bigramfdist.update(bigrams)
return bigramfdist
bigramfdist = compute_freq()
我没有看到预期的输出部分,因此我认为这可能是需要的。
import nltk
def compute_freq(sentence, n_value=2):
tokens = nltk.word_tokenize(sentence)
ngrams = nltk.ngrams(tokens, n_value)
ngram_fdist = nltk.FreqDist(ngrams)
return ngram_fdist
默认情况下,此函数返回二元组的频率分布-例如,
text = "This is an example sentence."
freq_dist = compute_freq(text)
现在,freq_dist看起来像-
FreqDist({('is', 'an'): 1, ('example', 'sentence'): 1, ('an', 'example'): 1, ('This',
'is'): 1, ('sentence', '.'): 1})
从这里您可以像这样打印键和值
for k,v in freq_dist.items():
print(k, v)
('is', 'an') 1
('example', 'sentence') 1
('an', 'example') 1
('This', 'is') 1
('sentence', '.') 1
对于其他二元组,只需在调用函数时更改'n_value'参数即可。例如,
freq_dist = compute_freq(text, n_value=3) #will give you trigram distribution
('example', 'sentence', '.') 1
('an', 'example', 'sentence') 1
('This', 'is', 'an') 1
('is', 'an', 'example') 1