我的代码以一个字符串开头,其中包含一个句子和每个单词所属的类别。然后我将这些信息存储在二维列表中:
cadena ="El/DT perro/N come/V carne/N de/P la/DT carnicería/N y/C de/P la/DT nevera/N y/C canta/V el/DT la/N la/N la/N ./Fp"
#convert string into list
cadena_list = []
for i in cadena.split():
cadena_list.append(i.split("/"))
我的目标是将信息存储在字典中,其中键是单词,值是一个二维列表,该列表显示该单词属于哪些类别以及该单词在每个类别中在句子中出现的次数。所以例如单词“la”的字典条目如下所示:words_categories = {“la”,[[DT,2],[“N”,3]]}。
我的代码如下所示:
#create dictionary that contains the categories and it's respective frequency for ever word
words_categories = {}
for i in range(len(cadena_list)):
if (cadena_list[i][0] in words_categories):
l = words_categories[cadena_list[i][0]]
if (cadena_list[i][1] in l):
words_categories[cadena_list[i][0]][l.index(cadena_list[i][1]) + 1] += 1
else:
words_categories[cadena_list[i][0]].append([cadena_list[i][1], 1])
else:
words_categories[cadena_list[i][0]] = [cadena_list[i][1], 1]
print(words_categories["la"])
这是输出:['DT', 2, ['N', 1], ['N', 1], ['N', 1]] 但我的目标是这样的:['DT', 2], ['N', 3]
您可能会发现使用 defaultdict
更容易管理类似这样的:
from collections import defaultdict
cadena = "El/DT perro/N come/V carne/N de/P la/DT carnicería/N y/C de/P la/DT nevera/N y/C canta/V el/DT la/N la/N la/N ./Fp"
outdict = defaultdict(list)
for t in cadena.split():
x, y = t.split('/')
for e in outdict[x]:
if e[0] == y:
e[1] += 1
break
else:
outdict[x].append([y, 1])
for entry in outdict.items():
print(*entry)
输出:
El [['DT', 1]]
perro [['N', 1]]
come [['V', 1]]
carne [['N', 1]]
de [['P', 2]]
la [['DT', 2], ['N', 3]]
carnicería [['N', 1]]
y [['C', 2]]
nevera [['N', 1]]
canta [['V', 1]]
el [['DT', 1]]
. [['Fp', 1]]