我有一个函数 pmi_count_phrase_create() ,我在其中输入 2 个列表,想要返回一个由我在函数中创建的列表组成的数据帧。现在的问题是我得到一个包含三个列表的元组。
这是我的代码:
def pmi_count_phrase_create(pmi_tups,freq_list):
import pandas as pd
"""pmi_tups is result of running pmi_tups = [i for i in finder.score_ngrams(bigram_measures.pmi)]
freq_list is a result of running freq_list= finder.ngram_fd.items()
-> 1 df made up of columns for pmi list, count list, phrase list"""
pmi3_list =[]
count3_list =[]
phrase3_list =[]
for phrase, pmi in pmi_tups:
for item in freq_list:
quadgram,count = item
if quadgram == phrase:
pmi3_list.append(pmi)
count3_list.append(count)
phrase3_list.append(phrase)
# create dataframe
df = pd.DataFrame({'Phrase':phrase3_list,'PMI':pmi3_list,'Count':count3_list})
return df
我当前的输出是以下形式的元组:(pmi3_list,count3_list,phrase3_list)
我想要的是一个数据框:
Phrase PMI Count
0 (activated, charcoal) 15.213655 12
1 (vocal, cords) 14.950620 10
2 (fossil, fuels) 14.872618 15
3 (simplified, explanation) 14.669334 12
4 (midnight, weekend) 14.181233 11
也许可以尝试:
def pmi_count_phrase_create(pmi_tups,freq_list):
import pandas as pd
"""pmi_tups is result of running pmi_tups = [i for i in
finder.score_ngrams(bigram_measures.pmi)]
freq_list is a result of running freq_list= finder.ngram_fd.items()
-> 1 df made up of columns for pmi list, count list, phrase list"""
pmi3_list =[]
count3_list =[]
phrase3_list =[]
for phrase, pmi in pmi_tups:
for item in freq_list:
quadgram,count = item
if quadgram == phrase:
pmi3_list.append(pmi)
count3_list.append(count)
phrase3_list.append(phrase)
# create dataframe
df = pd.DataFrame.from_dict({'Phrase':phrase3_list,'PMI':pmi3_list,'Count':count3_list})
return df
在哪里使用
pd.Dataframe.from_dict
而不是 pd.Dataframe
。