我有一个如下所示的数据框(以数组格式或非嵌套格式提供):
team | player | favorite_food
A | A_player1 | [pizza, sushi]
A | A_player2 | [salad, sushi]
B | B_player1 | [pizza, pasta, salad, taco]
B | B_player2 | [taco, salad, sushi]
B | B_player3 | [taco]
我想获得每支球队的球员共同食物的数量和百分比。像下面这样的东西:
team | #_food_common | percent_food_common
A | 1 | 0.33
B | 1 | 0.2
用 Python 最好是 Pandas 做这个的好方法是什么?
set
操作和groupby.agg
:
(df['favorite_food'].apply(set)
.groupby(df['team'])
.agg(**{'#_food_common': lambda x: len(set.intersection(*x)),
'percent_food_common': lambda x: len(set.intersection(*x))/len(set.union(*x)),
})
.reset_index()
)
输出:
team #_food_common percent_food_common
0 A 1 0.333333
1 B 1 0.200000
使用的输入:
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B'],
'player': ['A_player1', 'A_player2', 'B_player1', 'B_player2', 'B_player3'],
'favorite_food': [['pizza', 'sushi'],
['salad', 'sushi'],
['pizza', 'pasta', 'salad', 'taco'],
['taco', 'salad', 'sushi'],
['taco']]})