我有一个元组列表:
[(10,22), (10,20), (10,69), (34,18), (18,17), (89,990), (86,80), (174,175), (543,542)]
我想得到这样的结果:
[(10,22,20,69), (34,18,17), (89,990), (86, 80), (174,175), (543,542)]
我想将至少有一个共同元素的任何元组组合在一起。我该怎么做?
写的[在问题的初始版本中,后来被编辑]听起来你想要传递闭包,但实际上你还需要一个对称闭包,这样你就可以合并
(10,22)
,(10,20)
和(10,69)
.
def merge_sets(sets):
for i in range(len(sets) - 1, -1, -1):
for j in range(i - 1, -1, -1):
if sets[i] & sets[j]: # check for intersection (non-empty -> True)
sets[j] |= sets[i] # merge i-th set into j-th set
sets.pop(i) # remove i-th set
break # terminate inner loop and continue with next i
return sets
tuples = [(10,22), (10,20), (10,69), (34,18), (18,17), (89,990), (86,80), (174,175), (543,542)]
sets = [set(tuple) for tuple in tuples]
merged_sets = merge_sets(sets)
merged_tuples = [tuple(s) for s in merged_sets]
# merged_tuples: [(20, 69, 22, 10), (17, 34, 18), (89, 990), (80, 86), (174, 175), (542, 543)]
这段代码很简单。它首先将元组转换为集合,以便我们可以方便地检查共享元素(集合交集)。最后,我们将集合转换回元组。
函数
merge_sets
将所有集合相互比较。每当两个集合相交时,我们合并它们并继续。
三点值得进一步解释:
[(1, 2), (3, 4), (2, 3)]
(从左到右遍历)或 [(2, 3), (3, 4), (1, 2)]
(从右到左遍历,如在我的代码中)。)因为我们合并集合的方向与循环进行的方向相同(此处:从右到左;选择在算法上是任意的,但请参阅下一个要点),我们可以简单地继续循环,而不必在每次合并后都在较早的时间点重新启动它们。list
元素比从左侧更快。 (为了从左边快速移除,我们可以使用 collections.deque
。)map
)的原因是我们正在处理可变对象(我们将修改列表以及包含项目)。即使使用传统的循环,我们也需要小心地以正确的方式修改项目(通过在循环进程的方向上合并它们;参见第一个要点)。感谢用户 Crazy Chucky 提醒我非空集的计算结果为
True
.
将 compatible 元组链接在一起,即至少共享一个元素的元组。每条链将是原始数据的连接组件。
def cc_from_tuples(pairs:list[tuple]) -> list[tuple]:
"""Get Connected Components"""
ccs = []
# new instance of the data
s_pairs = set(pairs) # or pairs.copy()
# iterative classification of the pairs
while s_pairs:
# initialize connected component
cc = set(s_pairs.pop())
# temporary set
s_pairs_tmp = set()
# classify tuples as either part of the cc or as next candidate cc
for t in s_pairs:
if cc.intersection(t):
cc.update(t)
else:
s_pairs_tmp.add(t)
# add completed connected component
ccs.append(tuple(cc)) # casting
# iteration step
s_pairs = s_pairs_tmp
return css
ts = [(10,22), (10,20), (10,69), (34,18), (18,17), (89,990), (86,80), (174,175), (543,542)]
cc_from_tuples(ts)
#[(17, 18, 34), (10, 20, 69, 22), (542, 543), (89, 990), (80, 86), (174, 175)]
ts = [(1, 0), (2, 3), (1, 4)]
cc_from_tuples(ts)
#[(0, 1, 4), (2, 3)]
ts = [(1, 0), (2, 3), (1, 2)]
cc_from_tuples(ts)
#[(0, 1, 2, 3)]
如果结果需要按元组长度递减排序,修改函数的返回语句如下
return sorted(ccs, key=len, reverse=True)
我想知道您是否可以通过采用 Lover of Structure 的答案 背后的想法并移动 O(n^2) 部分以对整数进行运算来加快速度。这是一个尝试。在这个版本中,我不需要假设输入是一个集合列表:
from collections import defaultdict
tuples = [(10, 22), (10, 20), (10, 69), (34, 18), (18, 17),
(89, 990), (86, 80), (174, 175), (543, 542)]
values_to_index = defaultdict(int)
# This part is a single pass over the elements
for i, t in enumerate(tuples):
for v in t:
values_to_index[v] |= (1 << i)
# Merging the sets is done with integer operations
# Instead of popping elements, it seems simpler to construct
# a new list, since that will be shorter than the input,
# reducing the number of searches
merged_sets = []
for s in values_to_index.values():
# To avoid unnecessary checks for disjointness, keep track of the
# elements that merge pre-existing sets and clean up as you go
merges = []
for i in range(len(merged_sets)):
if merged_sets[i] & s:
merged_sets[i] |= s
merges.append(i)
if len(merges) == 0:
merged_sets.append(s)
elif len(merges) > 1:
k = merges[0]
for i in merges[:0:-1]:
merged_sets[k] |= merged_sets.pop(i)
# Now we have a list of integers telling us which set each tuple goes to
output = [set() for _ in range(len(merged_sets))]
for output_index, bits in enumerate(merged_sets):
while bits:
b = bits & (~bits + 1)
output[output_index].update(tuples[b.bit_length() - 1])
bits ^= b
result = [tuple(s) for s in output]
提供