假设我有以下数据框,其中每一行都包含一个独特的购物车:
pd.DataFrame({'0':['banana','apple','orange','milk'],'1':['apple','milk','bread','cheese'],'2':['bread','cheese','banana','eggs']})
0 1 2
0 banana apple bread
1 apple milk cheese
2 orange bread banana
3 milk cheese eggs
我正在尝试从每个购物车中创建一个最常见的大小为 n 的配对列表。例如,最常见的 2 号配对是
banana, bread
和 milk, cheese
pairing count
banana, bread 2
milk, cheese 2
apple, bread 1
...
orange, banana 1
澄清一下,这里的顺序并不重要,换句话说,购物车中首先出现的商品都是无关紧要的。
banana, bread
与 bread, banana
相同
我尝试将所有唯一值放入列表中并迭代每一行,并用
itertools
进行配对,但这似乎是一个非常hacky和unpythonic的解决方法,而且我什至没有让它正常工作。
itertools.combinations
和 collection.Counter
有效地循环每行值的组合(如 frozenset
),然后可以选择转换回系列:
from itertools import combinations
from collections import Counter
out = pd.Series(Counter(frozenset(c) for r in df.to_numpy()
for c in combinations(r, 2)))
输出:
(banana, apple) 1
(banana, bread) 2
(apple, bread) 1
(apple, milk) 1
(apple, cheese) 1
(milk, cheese) 2
(bread, orange) 1
(banana, orange) 1
(milk, eggs) 1
(eggs, cheese) 1
dtype: int64