我需要在迭代三个LISTs.为匹配的句子(list_sent为KEYs)和匹配的关键词(list_keywords)的单词列表(list_wordset为VALUEs)后创建一个Dictionary。请看下面的LISTs和预期输出的Dictionary,并加以说明。 请提出建议。
list_sent = ['one more shock like Covid-19',
'The number of people suffering acute',
'people must collectively act now',
'handling the novel coronavirus outbreak',
'After a three-week nationwide',
'strengthening medical quarantine']
list_wordset = [['people','suffering','acute'],
['Covid-19','Corona','like'],
['people','jersy','country'],
['novel', 'coronavirus', 'outbreak']]
list_keywords = ['people', 'Covid-19', 'nationwide','quarantine','handling']
'Covid-19'这个关键词在list_sent和list_wordset中都有出现,所以在Dictionary中也要抓取。'people'这个关键词在list_sent中的2个不同项目和list_wordset中的2个不同列表中都有出现,所以需要抓取。 即使list_wordset中的单个单词与关键词匹配,也是匹配的。
预期的输出是
out_dict =
{'one more shock like Covid-19': ['Covid-19','Corona','like'],
'The number of people suffering acute': [['people','suffering','acute'],['people','jersy','country']],
'people must collectively act now' : [['people','suffering','acute'],['people','jersy','country']]}
>>> {sent: [
wordset for wordset in list_wordset if any(word in sent for word in wordset)
] for sent in list_sent}
{'one more shock like Covid-19': [['Covid-19', 'Corona', 'like']],
'The number of people suffering acute': [['people', 'suffering', 'acute'], ['people', 'jersy', 'country']],
'people must collectively act now': [['people', 'suffering', 'acute'], ['people', 'jersy', 'country']],
'handling the novel coronavirus outbreak': [['novel', 'coronavirus', 'outbreak']],
'After a three-week nationwide': [],
'strengthening medical quarantine': []}
我能够使用所有3个列表,以Dictionary格式创建所需的输出。 要删除空值,使用了额外的步骤。
out_dict = {sent: [wordset for wordset in list_wordset if any(key in sent and key in wordset for key in list_keywords)]
for sent in list_sent}
结果。
{'one more shock like Covid-19': [['Covid-19', 'Corona', 'like']],
'The number of people suffering acute': [['people', 'suffering', 'acute'],
['people', 'jersy', 'country']],
'people must collectively act now': [['people', 'suffering', 'acute'],
['people', 'jersy', 'country']],
'handling the novel coronavirus outbreak': [],
'After a three-week nationwide': [],
'strengthening medical quarantine': []}
要删除空列表的值。
out_dict = dict( [(k,v) for k,v in out_dict.items() if len(v)>0])
最终结果:
{'one more shock like Covid-19': [['Covid-19', 'Corona', 'like']],
'The number of people suffering acute': [['people', 'suffering', 'acute'],
['people', 'jersy', 'country']],
'people must collectively act now': [['people', 'suffering', 'acute'],
['people', 'jersy', 'country']]}