我有这样的数据,这是gensim LDA模型的输出
date id score
1/1/2019 11 [(5,0.8), (11,0.2)]
1/2/2019 21 [(4,0.7), (10,0.1)]
1/3/2019 35 [(3,0.4)]
1/4/2019 44 [(5,0.8),(3,0.5), (11,0.2)]
结果应该这样。谁能帮忙?
date id score new_score
1/1/2019 11 5 0.8
1/1/2019 11 11 0.2
1/2/2019 21 4 0.7
1/2/2019 21 10 0.1
1/3/2019 35 3 0.4
1/4/2019 44 5 0.8
1/4/2019 44 3 0.5
1/4/2019 44 11 0.2
更新:
更好的方法是使用DataFrame.explode()
:
df = pd.read_csv('your_file_name.csv')
df = df.explode('score')
df[['score', 'new_score']] = df.score.apply(pd.Series)
您可以通过嵌套列表理解来做到这一点:
df = pd.read_csv('your_file_name.csv')
unpacked = [
{'date': row.date, 'id': row.id, 'score': x[0], 'new_score': x[1]}
for _, row in df.iterrows() for x in row.score
]
df = pd.DataFrame(unpacked)
DataFrame.iterrows()
返回框架中每一行的索引和行内容的元组,因此您可以系统地将其拆包: