我有一组集合 [setA, setB, setC, setD, ..., setX] 我想找到一种方法来获得每个组合的交集(没有重复组合),所以:
AB = setA.intersect(setB)
AC = setA.intersect(setC)
AD = setA.intersect(setD)
BC = setB.intersect(setC)
BD = setB.intersect(setD)
CD = setC.intersect(setD)
ABC = setA.intersect(setB.intersect(setC))
ABD = setA.intersect(setB.intersect(setD))
ACD = setA.intersect(setC.intersect(setD))
BCD = setB.intersect(setC.intersect(setD))
ABCD = setA.intersect(setB.intersect(setC.intersect(setD)))
我还想获得不同集合中的唯一值,这些值不存在于它们的组合中。 setA 中的 Ergo 值不在 AB、AC、AD、ABC、ABD、ACD 和 ABCD 中。不在 ABC、ABD 和 ABCD 中的 AB 值。不在 ABCD 中的 ABC 值。等等。
我希望最终输出是一个元组列表,其中每个元组如下所示:
(combo_name, unique_values, intersected_set)
目前为止都是手动操作,比较麻烦:
import pandas as pd
setA_name = 'A'
setB_name = 'B'
setC_name = 'C'
setA = {1,2,3,4,5,6,7,8,9,10}
setB = {2,3,7,11,13,17,23}
setC = {3,6,7,9,10,12,13,15,16}
setA_B = setA.intersection(setB)
setA_C = setA.intersection(setC)
setB_C = setB.intersection(setC)
setA_B_C = setA.intersection(setB.intersection(setC))
setA_B_only = setA_B-setA_B_C
setA_C_only = setA_C-setA_B_C
setB_C_only = setB_C-setA_B_C
setA_only = setA-setA_B_only-setA_C_only-setA_B_C
setB_only = setB-setA_B_only-setB_C_only-setA_B_C
setC_only = setC-setA_C_only-setB_C_only-setA_B_C
results = [
(setA_name, setA_only, setA),
(setB_name, setB_only, setB),
(setC_name, setC_only, setC),
(';'.join([setA_name, setB_name]), setA_B_only, setA_B),
(';'.join([setA_name, setC_name]), setA_C_only, setA_C),
(';'.join([setB_name, setC_name]), setB_C_only, setB_C),
(';'.join([setA_name, setB_name, setC_name]), setA_B_C, setA_B_C)
]
tab = pd.DataFrame(results)
tab.columns = ['Set', 'Unique', 'Common']
print(tab)
Set Unique Common
0 A {8, 1, 4, 5} {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
1 B {17, 11, 23} {2, 3, 7, 11, 13, 17, 23}
2 C {16, 12, 15} {3, 6, 7, 9, 10, 12, 13, 15, 16}
3 A;B {2} {2, 3, 7}
4 A;C {9, 10, 6} {3, 6, 7, 9, 10}
5 B;C {13} {3, 13, 7}
6 A;B;C {3, 7} {3, 7}
我不知道从哪里开始
使用@TheEngineerProgrammer 的建议更新方法
import pandas as pd
from itertools import combinations as combi
set_dict = {'A': {1,2,3,4,5,6,7,8,9,10}, 'B':{2,3,7,11,13,17,23}, 'C':{3,6,7,9,10,12,13,15,16}}
dict_keys = list(set_dict.keys())
common_dict = set_dict
for i, j in combi(dict_keys,2):
i_set = set_dict.get(i)
j_set = set_dict.get(j)
common = i_set.intersection(j_set)
key_name = ';'.join([i, j])
common_dict[key_name] = common
for i, j, k in combi(dict_keys,3):
i_set = set_dict.get(i)
j_set = set_dict.get(j)
k_set = set_dict.get(k)
common = i_set.intersection(j_set.intersection(k_set))
key_name = ';'.join([i, j, k])
common_dict[key_name] = common
uniq_dict = dict()
for x, y in combi(list(common_dict.keys()),2):
x_split = x.split(';')
if all(item in y for item in x_split):
print(x,'-',y)
if x in uniq_dict:
x_set = uniq_dict.get(x)
else:
x_set = common_dict.get(x)
y_set = common_dict.get(y)
x_uniq = x_set-y_set
print(x_uniq)
uniq_dict[x] = x_uniq
for key in set(common_dict.keys())-set(uniq_dict.keys()):
uniq_dict[key] = common_dict.get(key)
results = []
for key in uniq_dict.keys():
results.append((key, uniq_dict.get(key), common_dict.get(key)))
tab = pd.DataFrame(results, columns = ['Set', 'Unique', 'Common'])
print(tab)
Set Unique Common
0 A {8, 1, 4, 5} {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
1 B {17, 11, 23} {2, 3, 7, 11, 13, 17, 23}
2 C {16, 12, 15} {3, 6, 7, 9, 10, 12, 13, 15, 16}
3 A;B {2} {2, 3, 7}
4 A;C {9, 10, 6} {3, 6, 7, 9, 10}
5 B;C {13} {3, 13, 7}
6 A;B;C {3, 7} {3, 7}
如何根据 set_dict 中的项目数量增加这部分的增长?
for i, j in combi(dict_keys,2):
...
for i, j, k in combi(dict_keys,3):
...
for i, j, k, l in combi(dict_keys,4):
...
for i, j, k, l, m in combi(dict_keys,5):
...
我认为你需要的是组合,这里是一个例子:
from itertools import combinations
my_list = ["A", "B", "C", "D"]
for i, j in combinations(my_list, 2):
print(i+j) #this gives you AB AC AD...
for i, j, k in combinations(my_list, 3):
print(i+j+k) #this gives you ABC, ABD...
我想你可以从这样的事情开始:
sets = {
'A': {1, 2, 3, 4, 5, 6, 7, 8, 9, 10},
'B': {2, 3, 7, 11, 13, 17, 23},
'C': {3, 6, 7, 9, 10, 12, 13, 15, 16},
}
elems = {}
for name, s in sets.items():
for e in s:
elems[e] = elems.get(e, '') + name
之后,
elems
将是一个dict元素->集合,比如
{1: 'A', 2: 'AB', 3: 'ABC'... etc
现在您可以轻松获取所需的组合:
# only A
print([k for k, v in elems.items() if v == 'A'])
# A & C
print([k for k, v in elems.items() if 'A' in v and 'C' in v])
# A & C only
print([k for k, v in elems.items() if v == 'AC'])
您描述的要求有点难以理解,特别是对于“不同集合中的唯一值”部分。我想我现在明白了,但我无法想象该设置操作结果在现实世界中的用途是什么,所以我邀请您审查需求,如果这确实是需要的。
不管怎样,就在下面。使用这种方法,您可以拥有任意多的集合,而不必担心声明指数数量的变量来操作它们(顺便说一句,这里有一些您可能会觉得有用的术语:您在 "powerset 上操作“ 你的集合。)
from itertools import combinations
from functools import reduce
sets = {
"A": {1,2,3,4,5,6,7,8,9,10},
"B": {2,3,7,11,13,17,23},
"C": {3,6,7,9,10,12,13,15,16},
}
for r in range(len(sets)):
for combo in combinations(sets, r+1):
name = "".join(combo)
intersection = reduce(set.intersection, (sets[n] for n in combo))
unique = reduce(set.difference, (sets[n] for n in sets if n not in combo), intersection)
print(name, unique, intersection)
输出:
A {8, 1, 4, 5} {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
B {17, 11, 23} {17, 2, 3, 23, 7, 11, 13}
C {16, 12, 15} {3, 6, 7, 9, 10, 12, 13, 15, 16}
AB {2} {2, 3, 7}
AC {9, 10, 6} {3, 6, 7, 9, 10}
BC {13} {3, 13, 7}
ABC {3, 7} {3, 7}