Python 找到所有非重复组合并获得它们的共同值和它们的唯一值

问题描述 投票:0回答:3

我有一组集合 [setA, setB, setC, setD, ..., setX] 我想找到一种方法来获得每个组合的交集(没有重复组合),所以:

AB = setA.intersect(setB)
AC = setA.intersect(setC)
AD = setA.intersect(setD)
BC = setB.intersect(setC)
BD = setB.intersect(setD)
CD = setC.intersect(setD)
ABC = setA.intersect(setB.intersect(setC))
ABD = setA.intersect(setB.intersect(setD))
ACD = setA.intersect(setC.intersect(setD))
BCD = setB.intersect(setC.intersect(setD))
ABCD = setA.intersect(setB.intersect(setC.intersect(setD)))

我还想获得不同集合中的唯一值,这些值不存在于它们的组合中。 setA 中的 Ergo 值不在 AB、AC、AD、ABC、ABD、ACD 和 ABCD 中。不在 ABC、ABD 和 ABCD 中的 AB 值。不在 ABCD 中的 ABC 值。等等。

我希望最终输出是一个元组列表,其中每个元组如下所示:

(combo_name, unique_values, intersected_set)

目前为止都是手动操作,比较麻烦:

import pandas as pd 
setA_name = 'A'
setB_name = 'B'
setC_name = 'C'
setA = {1,2,3,4,5,6,7,8,9,10}
setB = {2,3,7,11,13,17,23}
setC = {3,6,7,9,10,12,13,15,16}
setA_B = setA.intersection(setB)
setA_C = setA.intersection(setC)
setB_C = setB.intersection(setC)
setA_B_C = setA.intersection(setB.intersection(setC))
setA_B_only = setA_B-setA_B_C
setA_C_only = setA_C-setA_B_C    
setB_C_only = setB_C-setA_B_C
setA_only = setA-setA_B_only-setA_C_only-setA_B_C
setB_only = setB-setA_B_only-setB_C_only-setA_B_C
setC_only = setC-setA_C_only-setB_C_only-setA_B_C
results = [
    (setA_name, setA_only, setA),
    (setB_name, setB_only, setB),
    (setC_name, setC_only, setC),
    (';'.join([setA_name, setB_name]), setA_B_only, setA_B),
    (';'.join([setA_name, setC_name]), setA_C_only, setA_C),
    (';'.join([setB_name, setC_name]), setB_C_only, setB_C),
    (';'.join([setA_name, setB_name, setC_name]), setA_B_C, setA_B_C)
    ]
tab = pd.DataFrame(results)
tab.columns = ['Set', 'Unique', 'Common']
print(tab)
     Set        Unique                            Common
0      A  {8, 1, 4, 5}   {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
1      B  {17, 11, 23}         {2, 3, 7, 11, 13, 17, 23}
2      C  {16, 12, 15}  {3, 6, 7, 9, 10, 12, 13, 15, 16}
3    A;B           {2}                         {2, 3, 7}
4    A;C    {9, 10, 6}                  {3, 6, 7, 9, 10}
5    B;C          {13}                        {3, 13, 7}
6  A;B;C        {3, 7}                            {3, 7}

我不知道从哪里开始

使用@TheEngineerProgrammer 的建议更新方法

import pandas as pd
from itertools import combinations as combi
set_dict = {'A': {1,2,3,4,5,6,7,8,9,10}, 'B':{2,3,7,11,13,17,23}, 'C':{3,6,7,9,10,12,13,15,16}}
dict_keys = list(set_dict.keys())

common_dict = set_dict
for i, j in combi(dict_keys,2):
    i_set = set_dict.get(i)
    j_set = set_dict.get(j)
    common = i_set.intersection(j_set)
    key_name = ';'.join([i, j])
    common_dict[key_name] = common

for i, j, k in combi(dict_keys,3):
    i_set = set_dict.get(i)
    j_set = set_dict.get(j)
    k_set = set_dict.get(k)
    common = i_set.intersection(j_set.intersection(k_set))
    key_name = ';'.join([i, j, k])
    common_dict[key_name] = common
    

uniq_dict = dict()
for x, y in combi(list(common_dict.keys()),2):
    x_split = x.split(';')
    if all(item in y for item in x_split):
        print(x,'-',y)
        if x in uniq_dict:
            x_set = uniq_dict.get(x)
        else:
            x_set = common_dict.get(x)
        y_set = common_dict.get(y)
        x_uniq = x_set-y_set
        print(x_uniq)
        uniq_dict[x] = x_uniq

for key in set(common_dict.keys())-set(uniq_dict.keys()):
    uniq_dict[key] = common_dict.get(key)

results = []
for key in uniq_dict.keys():
    results.append((key, uniq_dict.get(key), common_dict.get(key)))

tab = pd.DataFrame(results, columns = ['Set', 'Unique', 'Common'])
print(tab)
     Set        Unique                            Common
0      A  {8, 1, 4, 5}   {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
1      B  {17, 11, 23}         {2, 3, 7, 11, 13, 17, 23}
2      C  {16, 12, 15}  {3, 6, 7, 9, 10, 12, 13, 15, 16}
3    A;B           {2}                         {2, 3, 7}
4    A;C    {9, 10, 6}                  {3, 6, 7, 9, 10}
5    B;C          {13}                        {3, 13, 7}
6  A;B;C        {3, 7}                            {3, 7}

如何根据 set_dict 中的项目数量增加这部分的增长?

for i, j in combi(dict_keys,2):
    ...

for i, j, k in combi(dict_keys,3):
    ...

for i, j, k, l in combi(dict_keys,4):
    ...

for i, j, k, l, m in combi(dict_keys,5):
    ...
python arrays combinations set-intersection
3个回答
0
投票

我认为你需要的是组合,这里是一个例子:

from itertools import combinations

my_list = ["A", "B", "C", "D"]
for i, j in combinations(my_list, 2):
    print(i+j) #this gives you AB AC AD...

for i, j, k in combinations(my_list, 3):
    print(i+j+k) #this gives you ABC, ABD...

0
投票

我想你可以从这样的事情开始:

sets = {
    'A': {1, 2, 3, 4, 5, 6, 7, 8, 9, 10},
    'B': {2, 3, 7, 11, 13, 17, 23},
    'C': {3, 6, 7, 9, 10, 12, 13, 15, 16},
}

elems = {}

for name, s in sets.items():
    for e in s:
        elems[e] = elems.get(e, '') + name

之后,

elems
将是一个dict元素->集合,比如

{1: 'A', 2: 'AB', 3: 'ABC'... etc

现在您可以轻松获取所需的组合:

# only A
print([k for k, v in elems.items() if v == 'A'])
# A & C
print([k for k, v in elems.items() if 'A' in v and 'C' in v])
# A & C only
print([k for k, v in elems.items() if v == 'AC'])

0
投票

您描述的要求有点难以理解,特别是对于“不同集合中的唯一值”部分。我想我现在明白了,但我无法想象该设置操作结果在现实世界中的用途是什么,所以我邀请您审查需求,如果这确实是需要的。

不管怎样,就在下面。使用这种方法,您可以拥有任意多的集合,而不必担心声明指数数量的变量来操作它们(顺便说一句,这里有一些您可能会觉得有用的术语:您在 "powerset 上操作“ 你的集合。)

from itertools import combinations
from functools import reduce


sets = {
    "A": {1,2,3,4,5,6,7,8,9,10},
    "B": {2,3,7,11,13,17,23},
    "C": {3,6,7,9,10,12,13,15,16},
}

for r in range(len(sets)):
    for combo in combinations(sets, r+1):
        name = "".join(combo)
        intersection = reduce(set.intersection, (sets[n] for n in combo))
        unique = reduce(set.difference, (sets[n] for n in sets if n not in combo), intersection)
        print(name, unique, intersection)

输出:

A   {8, 1, 4, 5} {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
B   {17, 11, 23} {17, 2, 3, 23, 7, 11, 13}
C   {16, 12, 15} {3, 6, 7, 9, 10, 12, 13, 15, 16}
AB  {2}          {2, 3, 7}
AC  {9, 10, 6}   {3, 6, 7, 9, 10}
BC  {13}         {3, 13, 7}
ABC {3, 7}       {3, 7}
© www.soinside.com 2019 - 2024. All rights reserved.