迭代数据框中包含
n
行和 6 列的行,并计算组合的频率 1:n
非工作模板代码:
import pandas as pd
import itertools
from collections import Counter
# create sample data
df = pd.DataFrame([
[2, 10, 18, 31, 41],
[12, 27, 28, 39, 42]
])
def get_combinations(row)
all_combinations[]
for i in range(1, len(df)+1):
result = list(itertools.combinations(df, i))
return all_combinations
# get all posssible combinations of values in a row
all_rows = df.apply(get_combinations, 1).values
all_rows_flatten = list(itertools.chain.from_iterable(all_rows))
# use Counter to count how many there are of each combination
count_combinations = Counter(all_rows_flatten)
print(all_combinations["count_combinations"])
您的示例数据只有唯一的组合,所以我对其进行了一些更改。
使用生成器比一次性计算所有组合的内存效率更高,并且仅迭代一次值将提高 CPU 效率(链接内置函数,如展平、计数器等,将对数据进行多次传递) .
这应该可以解决问题:
import pandas as pd
import itertools
from collections import defaultdict
df = pd.DataFrame([
[2, 10, 18, 31, 41],
[12, 27, 28, 39, 42],
[12,4,18,6,41]
])
def combination_generator(values):
for i in range(1, len(values)+1):
for c in itertools.combinations(values, i):
yield tuple(sorted(c))
combination_count=defaultdict(lambda:0)
df_dict = df.to_dict()
for row in df_dict:
for c in combination_generator(df_dict[row].values()):
combination_count[c]+=1
for c in combination_count:
print(f'{c} : {combination_count[c]}')