迭代 n 行 n 列的数据框中的行,并计算组合 1:n 的频率

问题描述 投票:0回答:1

迭代数据框中包含

n
行和 6 列的行,并计算组合的频率
1:n

非工作模板代码:

import pandas as pd
import itertools
from collections import Counter

# create sample data
df = pd.DataFrame([
    [2, 10, 18, 31, 41],
   [12, 27, 28, 39, 42]
])

def get_combinations(row)
  all_combinations[]
  for i in range(1, len(df)+1):
    result = list(itertools.combinations(df, i))
    return all_combinations

# get all posssible combinations of values in a row
all_rows = df.apply(get_combinations, 1).values
all_rows_flatten = list(itertools.chain.from_iterable(all_rows))

# use Counter to count how many there are of each combination
count_combinations = Counter(all_rows_flatten)
print(all_combinations["count_combinations"])
python pandas collections combinations python-itertools
1个回答
0
投票

您的示例数据只有唯一的组合,所以我对其进行了一些更改。

使用生成器比一次性计算所有组合的内存效率更高,并且仅迭代一次值将提高 CPU 效率(链接内置函数,如展平、计数器等,将对数据进行多次传递) .

这应该可以解决问题:

import pandas as pd
import itertools
from collections import defaultdict

df = pd.DataFrame([
    [2, 10, 18, 31, 41],
    [12, 27, 28, 39, 42],
    [12,4,18,6,41]
])

def combination_generator(values):
    for i in range(1, len(values)+1):
        for c in itertools.combinations(values, i):
            yield tuple(sorted(c))


combination_count=defaultdict(lambda:0)
df_dict = df.to_dict()
for row in df_dict:
    for c in combination_generator(df_dict[row].values()):
        combination_count[c]+=1

for c in combination_count:
    print(f'{c} : {combination_count[c]}')
© www.soinside.com 2019 - 2024. All rights reserved.