我想用 nltk.corpus.udhr.fileids() 包做一个索马里语单词拼写校正的项目,但问题是 nltk 的属性不支持这种语言,我很难训练自己的拼写校正因为我是机器学习的新手。
import nltk
nltk.download('udhr')
from nltk.corpus import udhr
nltk.corpus.udhr.fileids()
somali_text = nltk.corpus.udhr.raw('Somali-Latin1')
len(somali_text)
words = list(somali_text)
print(words)
#dictionary that maps each word to its frequency
frequence_dict = {}
for word in words:
if word not in frequence_dict:
frequence_dict[word] = 1
else:
frequence_dict[word] += 1
#sorting the dictionary by frequency
sorted_frequency_dict = sorted(frequence_dict.items(), key = lambda x: x[1], reverse = True)
#misspelled words
misspelled_words = []
for word, frequency in sorted_frequency_dict:
if frequency < 3:
misspelled_words.append(word)
#corrections for misspelled words
corrections = {}
for misspelled_word in misspelled_words:
corrections[misspelled_word] = nltk.suggest(misspelled_word)
当我运行代码时,它显示以下错误
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-30-0f8ec385447f> in <cell line: 25>()
24 corrections = {}
25 for misspelled_word in misspelled_words:
---> 26 corrections[misspelled_word] = nltk.suggest(misspelled_word)
27
28 # Suggest corrections for the misspelled words
AttributeError: module 'nltk' has no attribute 'suggest'.