将整数数组拆分为具有 min 和 max 之间的最大差值之和的子数组

Question

我正在尝试找到有效解决这个问题的算法：

给定一个未排序的数字数组，需要将其分成长度从a到b的多个子数组，使得每个子数组中的最小和最大数字的差值之和最大。必须保留数字的顺序。

示例：
a = 3, b = 7
input: [5, 8, 4, 5, 1, 3, 5, 1, 3, 1]
answer: [[5, 8, 4], [5, 1, 3], [5, 1, 3, 1]] (diff sum is 12)

a = 3, b = 4
input: [1, 6, 2, 2, 5, 2, 8, 1, 5, 6]
answer: [[1, 6, 2], [2, 5, 2, 8], [1, 5, 6]] (diff sum is 16)

a = 4, b = 5
input: [5, 8, 4, 5, 1, 3, 5, 1, 3, 1, 2]
answer: splitting is impossible

到目前为止我想出的唯一解决方案是尝试所有可能的子数组组合。

from collections import deque

def partition_array(numbers, min_len, max_len):
  max_diff_subarray = None

  queue = deque()

  for end in range(min_len - 1, max_len):
    if end < len(numbers):
      diff = max(numbers[0:end + 1]) - min(numbers[0:end + 1])
      queue.append(Subarray(previous=None, start=0, end=end, diff_sum=diff))

  while queue:
    subarray = queue.popleft()

    if subarray.end == len(numbers) - 1:
      if max_diff_subarray is None:
        max_diff_subarray = subarray
      elif max_diff_subarray.diff_sum < subarray.diff_sum:
        max_diff_subarray = subarray
      continue

    start = subarray.end + 1

    for end in range(start + min_len - 1, start + max_len):
      if end < len(numbers):
        diff = max(numbers[start:end + 1]) - min(numbers[start:end + 1])
        queue.append(Subarray(previous=subarray, start=start, end=end, diff_sum=subarray.diff_sum + diff))
      else:
        break

  return max_diff_subarray

class Subarray:
  def __init__(self, previous=None, start=0, end=0, diff_sum=0):
    self.previous = previous
    self.start = start
    self.end = end
    self.diff_sum = diff_sum

numbers = [5, 8, 4, 5, 1, 3, 5, 1, 3, 1]
a = 3
b = 7
result = partition_array(numbers, a, b)
print(result.diff_sum)

有没有更省时的解决方案？

Answer 1

首先我们来解决一个更简单的问题。让我们遍历一个数组，并给出所有固定大小窗口的最小值和最大值。

def window_mins_maxes (size, array):
    min_values = deque()
    min_positions = deque()
    max_values = deque()
    max_positions = deque()

    for i, value in enumerate(array):
        if size <= i:
            yield (i, min_values[0], max_values[0])
            if min_positions[0] <= i - size:
                min_values.popleft()
                min_positions.popleft()

            if max_positions[0] <= i - size:
                max_values.popleft()
                max_positions.popleft()

        while 0 < len(min_values) and value <= min_values[-1]:
            min_values.pop()
            min_positions.pop()
        min_values.append(value)
        min_positions.append(i)

        while 0 < len(max_values) and max_values[-1] <= value:
            max_values.pop()
            max_positions.pop()
        max_values.append(value)
        max_positions.append(i)

    yield (len(array), min_values[0], max_values[0])

这显然需要记忆

O(size)

。不太明显的是，处理长度为

O(n)

的数组需要时间

。但我们可以通过摊销分析看到这一点。对于每个元素，我们将归因于检查小于它的可能值的成本、某些后续元素检查它是否应该被删除的成本以及添加的成本。这说明了所有操作（尽管这不是它们发生的顺序），并且每个元素的工作量是固定的。

另请注意，这部分解决方案所需的内存在

O(n)

范围内。

到目前为止，我认为这是一个众所周知的动态规划问题。现在让我们让它更具挑战性。

我们将把分区问题当作传统的动态规划问题来解决。我们将构建一个数组

best_weight

表示该点的最佳分区，以及

prev_index

表示在该点之前结束的上一个分区的开始位置。

为了构建它，我们将使用上述算法获取先前的分区并向其中添加

min_len

之一。如果它比之前的更好，我们会将其信息保存在这些数组中。然后，我们将从该分区向前扫描，直到

max_len

。然后我们继续进行下一个可能的分区开始。

完成后，我们将从该代码中找到答案。

看起来像这样：

def partition_array(numbers, min_len, max_len):
    if max_len < min_len or len(numbers) < min_len:
        return (None, None)

    best_weight = [None for _ in numbers]
    prev_index = [None for _ in numbers]

    # Need an extra entry for off of the end of the array.
    best_weight.append(None)
    prev_index.append(None)

    best_weight[0] = 0

    for i, min_value, max_value in window_mins_maxes(min_len, numbers):
        window_start_weight = best_weight[i - min_len]
        if window_start_weight is not None:
            j = i
            while j - i < max_len - min_len and j < len(numbers):
                new_weight = window_start_weight + max_value - min_value
                if best_weight[j] is None or best_weight[j] < new_weight:
                    best_weight[j] = new_weight
                    prev_index[j] = i - min_len

                if numbers[j] < min_value:
                    min_value = numbers[j]
                if max_value < numbers[j]:
                    max_value = numbers[j]
                j += 1

            # And fill in the longest value.
            new_weight = window_start_weight + max_value - min_value
            if best_weight[j] is None or best_weight[j] < new_weight:
                best_weight[j] = new_weight
                prev_index[j] = i - min_len

    if best_weight[-1] is None:
        return (None, None)
    else:
        path = [len(numbers)]
        while prev_index[path[-1]] is not None:
            path.append(prev_index[path[-1]])
        path = list(reversed(path))
        partitioned = [numbers[path[i]:path[i+1]] for i in range(len(path)-1)]
        return (best_weight[-1], partitioned)

请注意，我们对每个可能的开始和长度都进行

O(1)

工作。这就是时间

O((max_len + 1 - min_len)*n)

。我们使用的数据结构的大小都以

O(n)

为界。提供我在评论中承诺的整体效率。

现在让我们测试一下。

print(partition_array([5, 8, 4, 5, 1, 3, 5, 1, 3, 1], 3, 7))
print(partition_array([1, 6, 2, 2, 5, 2, 8, 1, 5, 6], 3, 4))
print(partition_array([5, 8, 4, 5, 1, 3, 5, 1, 3, 1, 2], 4, 5))

输出为：

(12, [[5, 8, 4], [5, 1, 3], [5, 1, 3, 1]])
(16, [[1, 6, 2], [2, 5, 2, 8], [1, 5, 6]])
(None, None)

Answer 2

有没有更省时的解决方案？

是的，有。

btilly的答案表明，问题代码中提出的算法需要随着数组大小呈指数增加时间，可以通过具有几乎线性时间依赖性的动态编程方法来击败。

还有可能做得更好吗？还有进步的空间吗？

让我们从构建以下测试用例开始，其中三个不同的分区选项提供相同的最佳结果，并且整数列表的大小允许可靠地测量所需的计算时间：

from time import perf_counter as T
from random import randint, seed 
seed(2); arr=[]
for _ in range(100000): arr.append(randint(1,10000000))
sT=T(); result=partition_array(arr, 49999, 50001); eT=T()
print(eT-sT)
print("btl", result[0], result[1] , end='')

并使用

btilly 提供的代码运行它。

在我的机器上我得到的输出：

0.10844088008161634
btl 19998737 [0, 49999, 100000]

btl（

btilly的快捷方式）代码需要在运行Python 3.10.12的Linux Mint 21.2 Xfce Thinkpad笔记本上大约0.1秒得出解决方案，并使用window_mins_maxes()

函数完成大部分工作，因此对其进行了优化使用

deque

 模块中的

collections

 数据类型。该代码输出最佳总和以及整数列表分区的可能多个解决方案之一。

在

window_mins_maxes()

 的另一个版本下面，我将其命名为

winMinMaxLargeSize()

，以将其与更简单的

winMinMax()

 代码区分开来，该代码在小整数列表上运行速度非常快，但在大整数列表上运行速度很慢：

def winMinMax(size, array):
    for i  in range(size, len(array)+1):
        yield i, min(array[ i-size : i ]), max(array[ i -size : i])

def winMinMaxLargeSize(size, L):
    """ Let's optimize min/max calculation by re-using already got values what starts to 
    make sense probably only at larger sizes (for example greater than 7).  
    This function moves a "Window" with width equal 'size' over the list 'L' calculating 
    the min/max values and remembering their list "L' indices. The indices are used then
    to skip recalculating of min/max values in case the value "leaving" the "Window" has
    no impact on the min/max values because in the range between min and max. The
    variable names storing these indices are 'indxMin' and 'indxMax' """

    # Lets calculate the min/max values for the first "Window" at the beginning of 'L' to 
    # have a reference start values for "moving" of the "Window" over the list  'L'  : 
    prevMin, prevMax = min(L[0 : size ]), max(L[0: size])
    lenL=len(L)
    indxCounterMin=0; indxCounterMax=0
    for i in range(size)[::-1] :
        if indxCounterMin == 0 and L[i] == prevMin :
              indxMin = i
              indxCounterMin+= 1
        if indxCounterMax == 0 and L[i] == prevMax :
              indxMax = i
              indxCounterMax+= 1
        if indxCounterMax and indxCounterMin :  # index positions of min/max are found
            break                               #   so we may exit the search loop . 
    #print(f"i={size=} , {indxMin=}, {indxMax=}, {prevMin=}, {prevMax=}")
    yield size, prevMin, prevMax

    for i  in range(size+1, lenL+1):
        if ( i - size - 1 ) == indxMin:
            prevMin=min(L[ i - size : i ])
            for indx in range( i - size, i )[::-1]:
                if L[indx] == prevMin :
                    indxMin = indx
                    break
        elif ( i  - size  - 1 ) == indxMax:
            prevMax=max(L[ i - size : i ])
            for indx in list(range(i - size , i ))[::-1]:
                if L[indx] == prevMax :
                    indxMax = indx
                    break
        else:   
            if  L[ i  - 1 ] <= prevMin :
                indxMin = i  -  1  
                if      L[ i  - 1 ] < prevMin :
                    prevMin = L[ i  - 1 ]
            elif    L[ i  - 1 ] >= prevMax :
                indxMax = i  - 1
                if      L[ i  - 1 ] > prevMax :
                    prevMax = L[ i  - 1 ]
             
        #print(f"  >  {i=:2d}, {indxMin=}, {indxMax=}, {prevMin=}, {prevMax=}")
        yield i, prevMin, prevMax

将“btl”代码中使用的函数替换为上面的函数会得到以下输出：

0.05169699003454298
btl-opt 19998737 [0, 49999, 100000]

哇！通过代码的“btl-opt”优化，可以将时间缩短一半并达到

0.05 秒。

*这是否已经是所能达到的极限，还是还有改进的空间？

让我们把动态规划方法放在一边，使用另一种方法，通过“暴力”来解决问题，并检查所有可能的列表分区情况。作为副作用，它会给出多个分区解决方案（如果有的话）：

def allArrangementsOfIntegerItemsInLsummingUpTOtargetSum(L, targetSum, n=None, m=None):
    if n is None:       n  = 1
    if m is None:   m = targetSum
    lenL = len(L)
    # print(f"{targetSum=}, {L=}, {n=}, {m=}")

    Stack           = []
    
    # Initialize the Stack with the starting point for each element in L
    for i in range(lenL):
        currentSum  =   L[ i ]
        path        = [   L[ i ]   ]
        start       = 0         # Start from 0 allows revisiting of all items
        Stack.append(   (currentSum, path, start )   )  

    while Stack:
        currentSum, path, start = Stack.pop()
        # Check if the current path meets the criteria
        if currentSum == targetSum and n <= len(path) <= m:
            yield path
        if currentSum > targetSum or len(path) > m:
            continue  
        # ^ - NEXT please: stop exploring this path as it's not valid or complete

        # Continue to build the path by adding elements from L, starting from 0 index
        for i in range(len(L)):  # Change start to 0 if you want to include all permutations
            newSum = currentSum + L[ i  ]
            newPath = path + [ L[ i  ]  ]
            Stack.append((newSum, newPath, 0))  # Start from 0 allows every possibility
# def allArrangementsOfIntegerItemsInLsummingUpTOtargetSum

splitsGenerator = allArrangementsOfIntegerItemsInLsummingUpTOtargetSum

def arrSplit(Arr, m, n) :
    targetSum=len(Arr)
    L=list(range( m, n+1) )
    bestSum = None
    for split in splitsGenerator( L, targetSum )  :
        #print("###", split)
        startIndx   = 0
        endIndx = 0
        currSum = 0
        for indx in split :
            endIndx += indx
            currSum = currSum + max(Arr[startIndx : endIndx ]) - min(Arr[ startIndx : endIndx  ])
            startIndx=startIndx + endIndx

        if bestSum is None:
            bestSum = currSum
            bestSplit   = [ split ]
            continue

        if bestSum <= currSum :
            if bestSum == currSum :
                bestSplit.append(split)
            else: 
                bestSum = currSum
                bestSplit = [ split ]


    return bestSum, bestSplit

targetSum = 10000
from time import perf_counter as T
from random import randint, seed 
seed(2); arr=[]
for _ in range(100000): arr.append(randint(1,10000000))
sT=T(); result=arrSplit(arr, 49999, 50001); eT=T()
print(eT-sT)
print("oOo", result[0], result[1], end='')

代码输出：

0.012507450999692082
oOo 19998737 [[50001, 49999], [50000, 50000], [49999, 50001]]

并且击败了已经非常快的代码，我们已经开始了旅程，速度快了十倍，时间为

0.01秒。

这还可以进一步提高一个数量级吗？我很确定它可以比这个更进一步，所以仍然有很大的改进空间，并且可以用更快的代码来继续这个旅程。

愿oOo的力量与你同在！

将整数数组拆分为具有 min 和 max 之间的最大差值之和的子数组

问题描述投票：0回答：2

2个回答

最新问题

将整数数组拆分为具有 min 和 max 之间的最大差值之和的子数组

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2