我有带有元组列表的列,并希望将此元组转换为新列。请参见下面的示例
df = pd.DataFrame(dict(a=[1,2,3],
b=['a', 'a', 'b'],
c=[[('pear', 1), ('apple', 2)], [('pear', 7), ('orange', 1)], [('apple', 9)] ]))
df
a b c
0 1 a [(pear, 1), (apple, 2)]
1 2 a [(pear, 7), (orange, 1)]
2 3 b [(apple, 9)]
并且想要将其转换为
a b fruit value
0 1 a pear 1
1 1 a apple 2
2 2 a pear 7
3 2 a orange 1
4 3 b apple 9
我可以做到,但是效率不是很高,就我而言,我有超过500K的行。有更有效的方法吗?
注意:我使用的是熊猫0.21,由于我的项目要求,目前无法升级。
谢谢
试一下,看看它是否适用于您的版本:
from itertools import product,chain
#create a cartesian for each row in df
phase1 = (product([a],b,c) for a,b,c in df.to_numpy())
#unpack the third entry per row in the flattened iterable
phase2 = [(a,b,*c) for a, b, c in chain.from_iterable(phase1)]
#create dataframe
result = pd.DataFrame(phase2, columns = ["a","b","fruit","value"])
a b fruit value
0 1 a pear 1
1 1 a apple 2
2 2 a pear 7
3 2 a orange 1
4 3 b apple 9
想法是将列表理解中的值重塑为新的DataFrame,然后使用DataFrame.merge
:
DataFrame.merge
也许您可以这样尝试:
df1 = pd.DataFrame([(k, *x) for k, v in df.pop('c').items() for x in v],
columns=['i','fruit','value'])
print (df1)
i fruit value
0 0 pear 1
1 0 apple 2
2 1 pear 7
3 1 orange 1
4 2 apple 9
df = df.merge(df1, left_index=True, right_on='i').drop('i', axis=1)
print (df)
a b fruit value
0 1 a pear 1
1 1 a apple 2
2 2 a pear 7
3 2 a orange 1
4 3 b apple 9