在前2个问题之后,仍无法解决问题。question 1question 2
我有一个python脚本,可以在分析文本部分之前清除文本。
所以我有一些功能可以清理文本并制作POS标签,以便拆分文本并将其标记化。我需要返回单词+标签+现有频率。
问题是该函数使用元组列表,然后结束系统崩溃并显示以下错误:
文件“ F:\ AIenv \ textAnalysis \ setup.py”,第221行,位于tag_and_save中file.write(“ {0} / {1} {2} \ n” .format(word,tag,freq_tagged_data [word]))
TypeError:列表索引必须是整数或切片,而不是str
def get_freq(tagged):
freq_dist = {}
freqs = FreqDist(tagged)
freq_dist = [(word, freq) for word ,freq in freqs.items()]
# print(freq_dist)
return freq_dist
def tag_and_save(tagger,text,path):
clt = clean_text(text)
tagged_data = tagger.tag(clt)
print("tagged_data\n\n\n",tagged_data)## **here its a list of tuple [('','')]**
tagged_data = sorted(tagged_data,key=operator.itemgetter(1))
freq_tagged_data = get_freq(tagged_data)
file = open(path,"w",encoding = "UTF8")
for word,tag in tagged_data:
file.write("{0} /{1} {2} \n".format(word,tag,freq_tagged_data[word]))## the error is here
file.close()
预期输出:(“ ***** / POS tag”)次数。
更改
freq_dist = [(word, freq) for word ,freq in freqs.items()]
to
for word, freq in freqs.items():
freq_dist[word] = freq
它可能会解决问题。当您将字典更改为该行中的列表时。
在tag_and_save
中尝试:
for word,tag in tagged_data:
if (word and word != "") and (tag and tag != ""):
file.write("{0} /{1} {2} \n".format(word,tag,freq_tagged_data[word]))