根据上一个问题,我更改了代码,但仍然无法正常工作。click here
我有一个python脚本,可以读取文本并应用预处理功能以进行分析。问题是我想计算单词的出现频率,但是系统崩溃并显示以下错误。
file“ F:\ AIenv \ textAnalysis \ setup.py”,第219行,位于tag_and_save中file.write(word +“ /” + tag +“(frequency =” + freq_tagged_data [word] +“)\ n”)
TypeError:列表索引必须是整数或切片,而不是str
def get_freq(tagged):
freq_dist = {}
freqs = FreqDist(tagged)
freq_dist = [(word, freq) for word ,freq in freqs.items()]
# print(freq_dist)
return freq_dist
def tag_and_save(tagger,text,path):
clt = clean_text(text)
tagged_data = tagger.tag(clt)
tagged_data = sorted(tagged_data,key=operator.itemgetter(1))
freq_tagged_data = get_freq(tagged_data)
file = open(path,"w",encoding = "UTF8")
for word,tag in tagged_data:
file.write(word+"/"+tag+" (frequency="+ freq_tagged_data[word] +")\n")
file.close()
如果我尝试将单词转换为int()
def tag_and_save(tagger,text,path):
clt = clean_text(text)
tagged_data = tagger.tag(clt)
tagged_data = sorted(tagged_data,key=operator.itemgetter(1))
freq_tagged_data = get_freq(tagged_data)
file = open(path,"w",encoding = "UTF8")
for word,tag in tagged_data:
file.write(word+"/"+tag+" (frequency="+ freq_tagged_data[int(word)] +")\n")
file.close()
它显示以下错误:
file“ F:\ AIenv \ textAnalysis \ setup.py”,第219行,位于tag_and_save中file.write(word +“ /” + tag +“(frequency =” + freq_tagged_data [int(word)] +“)\ n”)ValueError:的文字无效以10为底的int():''
预期的输出必须是这样的:
('***** / DTNN')3
def get_freq(tagged):
# freq_dist is dict
freq_dist = {}
freqs = FreqDist(tagged)
# freq_dist is list now
freq_dist = [(word, freq) for word ,freq in freqs.items()]
return freq_dist
您有几种选择来初始化字典:
dict
理解
def get_freq(tagged): freqs = FreqDist(tagged) return {word: freq for word ,freq in freqs.items()}
[update()
方法
有关详细信息,请参见update()文档。
def get_freq(tagged):
freq_dist = {}
freqs = FreqDist(tagged)
freq_dist.update([(word, freq) for word ,freq in freqs.items()])
return freq_dist
[dict
构造函数
def get_freq(tagged):
freqs = FreqDist(tagged)
# freq_dist is list now
return dict([(word, freq) for word ,freq in freqs.items()])