将整数数组拆分为具有 min 和 max 之间的最大差值之和的子数组

问题描述 投票:0回答:2

我正在尝试找到有效解决这个问题的算法:

给定一个未排序的数字数组,需要将其分成长度从a到b的多个子数组,使得每个子数组中的最小和最大数字的差值之和最大。必须保留数字的顺序。

示例:

a = 3, b = 7
input: [5, 8, 4, 5, 1, 3, 5, 1, 3, 1]
answer: [[5, 8, 4], [5, 1, 3], [5, 1, 3, 1]] (diff sum is 12)

a = 3, b = 4
input: [1, 6, 2, 2, 5, 2, 8, 1, 5, 6]
answer: [[1, 6, 2], [2, 5, 2, 8], [1, 5, 6]] (diff sum is 16)

a = 4, b = 5
input: [5, 8, 4, 5, 1, 3, 5, 1, 3, 1, 2]
answer: splitting is impossible

到目前为止我想出的唯一解决方案是尝试所有可能的子数组组合。

from collections import deque

def partition_array(numbers, min_len, max_len):
  max_diff_subarray = None

  queue = deque()

  for end in range(min_len - 1, max_len):
    if end < len(numbers):
      diff = max(numbers[0:end + 1]) - min(numbers[0:end + 1])
      queue.append(Subarray(previous=None, start=0, end=end, diff_sum=diff))

  while queue:
    subarray = queue.popleft()

    if subarray.end == len(numbers) - 1:
      if max_diff_subarray is None:
        max_diff_subarray = subarray
      elif max_diff_subarray.diff_sum < subarray.diff_sum:
        max_diff_subarray = subarray
      continue

    start = subarray.end + 1

    for end in range(start + min_len - 1, start + max_len):
      if end < len(numbers):
        diff = max(numbers[start:end + 1]) - min(numbers[start:end + 1])
        queue.append(Subarray(previous=subarray, start=start, end=end, diff_sum=subarray.diff_sum + diff))
      else:
        break

  return max_diff_subarray

class Subarray:
  def __init__(self, previous=None, start=0, end=0, diff_sum=0):
    self.previous = previous
    self.start = start
    self.end = end
    self.diff_sum = diff_sum

numbers = [5, 8, 4, 5, 1, 3, 5, 1, 3, 1]
a = 3
b = 7
result = partition_array(numbers, a, b)
print(result.diff_sum)

有没有更省时的解决方案?

python algorithm
2个回答
4
投票

首先我们来解决一个更简单的问题。让我们遍历一个数组,并给出所有固定大小窗口的最小值和最大值。

def window_mins_maxes (size, array):
    min_values = deque()
    min_positions = deque()
    max_values = deque()
    max_positions = deque()

    for i, value in enumerate(array):
        if size <= i:
            yield (i, min_values[0], max_values[0])
            if min_positions[0] <= i - size:
                min_values.popleft()
                min_positions.popleft()

            if max_positions[0] <= i - size:
                max_values.popleft()
                max_positions.popleft()

        while 0 < len(min_values) and value <= min_values[-1]:
            min_values.pop()
            min_positions.pop()
        min_values.append(value)
        min_positions.append(i)

        while 0 < len(max_values) and max_values[-1] <= value:
            max_values.pop()
            max_positions.pop()
        max_values.append(value)
        max_positions.append(i)

    yield (len(array), min_values[0], max_values[0])

这显然需要记忆

O(size)
。不太明显的是,处理长度为
O(n)
的数组需要时间
n
。但我们可以通过摊销分析看到这一点。对于每个元素,我们将归因于检查小于它的可能值的成本、某些后续元素检查它是否应该被删除的成本以及添加的成本。这说明了所有操作(尽管这不是它们发生的顺序),并且每个元素的工作量是固定的。

另请注意,这部分解决方案所需的内存在

O(n)
范围内。

到目前为止,我认为这是一个众所周知的动态规划问题。现在让我们让它更具挑战性。

我们将把分区问题当作传统的动态规划问题来解决。我们将构建一个数组

best_weight
表示该点的最佳分区,以及
prev_index
表示在该点之前结束的上一个分区的开始位置。

为了构建它,我们将使用上述算法获取先前的分区并向其中添加

min_len
之一。如果它比之前的更好,我们会将其信息保存在这些数组中。然后,我们将从该分区向前扫描,直到
max_len
。然后我们继续进行下一个可能的分区开始。

完成后,我们将从该代码中找到答案。

看起来像这样:

def partition_array(numbers, min_len, max_len):
    if max_len < min_len or len(numbers) < min_len:
        return (None, None)

    best_weight = [None for _ in numbers]
    prev_index = [None for _ in numbers]

    # Need an extra entry for off of the end of the array.
    best_weight.append(None)
    prev_index.append(None)

    best_weight[0] = 0

    for i, min_value, max_value in window_mins_maxes(min_len, numbers):
        window_start_weight = best_weight[i - min_len]
        if window_start_weight is not None:
            j = i
            while j - i < max_len - min_len and j < len(numbers):
                new_weight = window_start_weight + max_value - min_value
                if best_weight[j] is None or best_weight[j] < new_weight:
                    best_weight[j] = new_weight
                    prev_index[j] = i - min_len

                if numbers[j] < min_value:
                    min_value = numbers[j]
                if max_value < numbers[j]:
                    max_value = numbers[j]
                j += 1

            # And fill in the longest value.
            new_weight = window_start_weight + max_value - min_value
            if best_weight[j] is None or best_weight[j] < new_weight:
                best_weight[j] = new_weight
                prev_index[j] = i - min_len

    if best_weight[-1] is None:
        return (None, None)
    else:
        path = [len(numbers)]
        while prev_index[path[-1]] is not None:
            path.append(prev_index[path[-1]])
        path = list(reversed(path))
        partitioned = [numbers[path[i]:path[i+1]] for i in range(len(path)-1)]
        return (best_weight[-1], partitioned)

请注意,我们对每个可能的开始和长度都进行

O(1)
工作。这就是时间
O((max_len + 1 - min_len)*n)
。我们使用的数据结构的大小都以
O(n)
为界。提供我在评论中承诺的整体效率。

现在让我们测试一下。

print(partition_array([5, 8, 4, 5, 1, 3, 5, 1, 3, 1], 3, 7))
print(partition_array([1, 6, 2, 2, 5, 2, 8, 1, 5, 6], 3, 4))
print(partition_array([5, 8, 4, 5, 1, 3, 5, 1, 3, 1, 2], 4, 5))

输出为:

(12, [[5, 8, 4], [5, 1, 3], [5, 1, 3, 1]])
(16, [[1, 6, 2], [2, 5, 2, 8], [1, 5, 6]])
(None, None)

0
投票

有没有更省时的解决方案?

是的,有。

btilly的答案表明,问题代码中提出的算法需要随着数组大小呈指数增加时间,可以通过具有几乎线性时间依赖性的动态编程方法来击败。

还有可能做得更好吗?还有进步的空间吗?

让我们从构建以下测试用例开始,其中三个不同的分区选项提供相同的最佳结果,并且整数列表的大小允许可靠地测量所需的计算时间:

from time import perf_counter as T from random import randint, seed seed(2); arr=[] for _ in range(100000): arr.append(randint(1,10000000)) sT=T(); result=partition_array(arr, 49999, 50001); eT=T() print(eT-sT) print("btl", result[0], result[1] , end='')
并使用 

btilly 提供的代码运行它。

在我的机器上我得到的输出:

0.10844088008161634 btl 19998737 [0, 49999, 100000]
btl(

btilly的快捷方式)代码需要在运行Python 3.10.12的Linux Mint 21.2 Xfce Thinkpad笔记本上大约0.1秒得出解决方案,并使用window_mins_maxes()

函数完成大部分工作,因此对其进行了优化使用 
deque
 模块中的 
collections
 数据类型。该代码输出最佳总和以及整数列表分区的可能多个解决方案之一。

window_mins_maxes()

 的另一个版本下面,我将其命名为 
winMinMaxLargeSize()
,以将其与更简单的 
winMinMax()
 代码区分开来,该代码在小整数列表上运行速度非常快,但在大整数列表上运行速度很慢:

def winMinMax(size, array): for i in range(size, len(array)+1): yield i, min(array[ i-size : i ]), max(array[ i -size : i]) def winMinMaxLargeSize(size, L): """ Let's optimize min/max calculation by re-using already got values what starts to make sense probably only at larger sizes (for example greater than 7). This function moves a "Window" with width equal 'size' over the list 'L' calculating the min/max values and remembering their list "L' indices. The indices are used then to skip recalculating of min/max values in case the value "leaving" the "Window" has no impact on the min/max values because in the range between min and max. The variable names storing these indices are 'indxMin' and 'indxMax' """ # Lets calculate the min/max values for the first "Window" at the beginning of 'L' to # have a reference start values for "moving" of the "Window" over the list 'L' : prevMin, prevMax = min(L[0 : size ]), max(L[0: size]) lenL=len(L) indxCounterMin=0; indxCounterMax=0 for i in range(size)[::-1] : if indxCounterMin == 0 and L[i] == prevMin : indxMin = i indxCounterMin+= 1 if indxCounterMax == 0 and L[i] == prevMax : indxMax = i indxCounterMax+= 1 if indxCounterMax and indxCounterMin : # index positions of min/max are found break # so we may exit the search loop . #print(f"i={size=} , {indxMin=}, {indxMax=}, {prevMin=}, {prevMax=}") yield size, prevMin, prevMax for i in range(size+1, lenL+1): if ( i - size - 1 ) == indxMin: prevMin=min(L[ i - size : i ]) for indx in range( i - size, i )[::-1]: if L[indx] == prevMin : indxMin = indx break elif ( i - size - 1 ) == indxMax: prevMax=max(L[ i - size : i ]) for indx in list(range(i - size , i ))[::-1]: if L[indx] == prevMax : indxMax = indx break else: if L[ i - 1 ] <= prevMin : indxMin = i - 1 if L[ i - 1 ] < prevMin : prevMin = L[ i - 1 ] elif L[ i - 1 ] >= prevMax : indxMax = i - 1 if L[ i - 1 ] > prevMax : prevMax = L[ i - 1 ] #print(f" > {i=:2d}, {indxMin=}, {indxMax=}, {prevMin=}, {prevMax=}") yield i, prevMin, prevMax
将“btl”代码中使用的函数替换为上面的函数会得到以下输出:

0.05169699003454298 btl-opt 19998737 [0, 49999, 100000]
哇!通过代码的“btl-opt”优化,可以将时间缩短一半并达到 

0.05 秒。

*这是否已经是所能达到的极限,还是还有改进的空间?

让我们把动态规划方法放在一边,使用另一种方法,通过“暴力”来解决问题,并检查所有可能的列表分区情况。作为副作用,它会给出多个分区解决方案(如果有的话):

def allArrangementsOfIntegerItemsInLsummingUpTOtargetSum(L, targetSum, n=None, m=None): if n is None: n = 1 if m is None: m = targetSum lenL = len(L) # print(f"{targetSum=}, {L=}, {n=}, {m=}") Stack = [] # Initialize the Stack with the starting point for each element in L for i in range(lenL): currentSum = L[ i ] path = [ L[ i ] ] start = 0 # Start from 0 allows revisiting of all items Stack.append( (currentSum, path, start ) ) while Stack: currentSum, path, start = Stack.pop() # Check if the current path meets the criteria if currentSum == targetSum and n <= len(path) <= m: yield path if currentSum > targetSum or len(path) > m: continue # ^ - NEXT please: stop exploring this path as it's not valid or complete # Continue to build the path by adding elements from L, starting from 0 index for i in range(len(L)): # Change start to 0 if you want to include all permutations newSum = currentSum + L[ i ] newPath = path + [ L[ i ] ] Stack.append((newSum, newPath, 0)) # Start from 0 allows every possibility # def allArrangementsOfIntegerItemsInLsummingUpTOtargetSum splitsGenerator = allArrangementsOfIntegerItemsInLsummingUpTOtargetSum def arrSplit(Arr, m, n) : targetSum=len(Arr) L=list(range( m, n+1) ) bestSum = None for split in splitsGenerator( L, targetSum ) : #print("###", split) startIndx = 0 endIndx = 0 currSum = 0 for indx in split : endIndx += indx currSum = currSum + max(Arr[startIndx : endIndx ]) - min(Arr[ startIndx : endIndx ]) startIndx=startIndx + endIndx if bestSum is None: bestSum = currSum bestSplit = [ split ] continue if bestSum <= currSum : if bestSum == currSum : bestSplit.append(split) else: bestSum = currSum bestSplit = [ split ] return bestSum, bestSplit targetSum = 10000 from time import perf_counter as T from random import randint, seed seed(2); arr=[] for _ in range(100000): arr.append(randint(1,10000000)) sT=T(); result=arrSplit(arr, 49999, 50001); eT=T() print(eT-sT) print("oOo", result[0], result[1], end='')
代码输出:

0.012507450999692082 oOo 19998737 [[50001, 49999], [50000, 50000], [49999, 50001]]
并且击败了已经非常快的代码,我们已经开始了旅程,速度快了十倍,时间为

0.01秒。


这还可以进一步提高一个数量级吗?我很确定它可以比这个更进一步,所以仍然有很大的改进空间,并且可以用更快的代码来继续这个旅程。

愿oOo的力量与你同在!

© www.soinside.com 2019 - 2024. All rights reserved.