我正在使用 Python 中的数据集,其中有一个元组列表,每个元组代表一个间隔(开始、结束)。我遇到过这样的情况:其中一些间隔重叠,我需要将这些重叠间隔合并为一个覆盖重叠间隔的整个范围的单个间隔。目标是减少列表以仅包含不重叠的间隔。
例如,给定列表
[(1, 3), (2, 4), (5, 7), (6, 8)]
所需的输出是
[(1, 4), (5, 8)]
。
这是我迄今为止尝试过的:
def merge_intervals(intervals):
sorted_intervals = sorted(intervals, key=lambda x: x[0])
merged = []
for interval in sorted_intervals:
if not merged or merged[-1][1] < interval[0]:
merged.append(interval)
else:
merged[-1] = (merged[-1][0], max(merged[-1][1], interval[1]))
return merged
my_intervals = [(1, 3), (2, 4), (5, 7), (6, 8)]
print(merge_intervals(my_intervals))
这个解决方案似乎有效,但我担心它的效率,特别是对于非常大的间隔列表。我正在寻找有关优化该算法以获得更好性能的建议,或者是否有更“Pythonic”的方法来解决这个问题。此外,我很好奇 Python 中是否有内置函数或库可以简化此任务。
您可以线性地执行此操作,无需排序。
from collections import defaultdict
from typing import Iterable
my_intervals = [(1, 3), (2, 4), (5, 7), (6, 8), (2, 3), (2, 3), (2, 4)]
def merger(intervals: Iterable[tuple]) -> Iterable[tuple]:
breakpoints = defaultdict(int)
min_val, max_val = intervals[0]
for start, end in intervals:
if end < start:
raise ValueError(f"Improper interval: {start} !<= {end}")
breakpoints[start] += 1
breakpoints[end] -= 1
min_val = start if start < min_val else min_val
max_val = end if end > max_val else max_val
res = []
current_start = min_val
marker = 0 # marker is > 0 when in an interval, else "dead space"
for i in range(min_val, max_val + 1):
marker += breakpoints.get(i, 0)
if current_start and marker == 0: # close the interval
end = i
res.append((current_start, end))
current_start = None
elif not current_start and marker > 0: # start an interval
current_start = i
return res
if __name__ == "__main__":
res = merger(my_intervals)
print(res)
[(1, 4), (5, 8)]