我正在尝试找到有效解决这个问题的算法:
给定一个未排序的数字数组,需要将其分成长度从a到b的多个子数组,使得每个子数组中的最小和最大数字的差值之和最大。必须保留数字的顺序。
示例:
a = 3, b = 7 input: [5, 8, 4, 5, 1, 3, 5, 1, 3, 1] answer: [[5, 8, 4], [5, 1, 3], [5, 1, 3, 1]] (diff sum is 12) a = 3, b = 4 input: [1, 6, 2, 2, 5, 2, 8, 1, 5, 6] answer: [[1, 6, 2], [2, 5, 2, 8], [1, 5, 6]] (diff sum is 16) a = 4, b = 5 input: [5, 8, 4, 5, 1, 3, 5, 1, 3, 1, 2] answer: splitting is impossible
到目前为止我想出的唯一解决方案是尝试所有可能的子数组组合。
from collections import deque
def partition_array(numbers, min_len, max_len):
max_diff_subarray = None
queue = deque()
for end in range(min_len - 1, max_len):
if end < len(numbers):
diff = max(numbers[0:end + 1]) - min(numbers[0:end + 1])
queue.append(Subarray(previous=None, start=0, end=end, diff_sum=diff))
while queue:
subarray = queue.popleft()
if subarray.end == len(numbers) - 1:
if max_diff_subarray is None:
max_diff_subarray = subarray
elif max_diff_subarray.diff_sum < subarray.diff_sum:
max_diff_subarray = subarray
continue
start = subarray.end + 1
for end in range(start + min_len - 1, start + max_len):
if end < len(numbers):
diff = max(numbers[start:end + 1]) - min(numbers[start:end + 1])
queue.append(Subarray(previous=subarray, start=start, end=end, diff_sum=subarray.diff_sum + diff))
else:
break
return max_diff_subarray
class Subarray:
def __init__(self, previous=None, start=0, end=0, diff_sum=0):
self.previous = previous
self.start = start
self.end = end
self.diff_sum = diff_sum
numbers = [5, 8, 4, 5, 1, 3, 5, 1, 3, 1]
a = 3
b = 7
result = partition_array(numbers, a, b)
print(result.diff_sum)
有没有更省时的解决方案?
首先我们来解决一个更简单的问题。让我们遍历一个数组,并给出所有固定大小窗口的最小值和最大值。
def window_mins_maxes (size, array):
min_values = deque()
min_positions = deque()
max_values = deque()
max_positions = deque()
for i, value in enumerate(array):
if size <= i:
yield (i, min_values[0], max_values[0])
if min_positions[0] <= i - size:
min_values.popleft()
min_positions.popleft()
if max_positions[0] <= i - size:
max_values.popleft()
max_positions.popleft()
while 0 < len(min_values) and value <= min_values[-1]:
min_values.pop()
min_positions.pop()
min_values.append(value)
min_positions.append(i)
while 0 < len(max_values) and max_values[-1] <= value:
max_values.pop()
max_positions.pop()
max_values.append(value)
max_positions.append(i)
yield (len(array), min_values[0], max_values[0])
这显然需要记忆
O(size)
。不太明显的是,处理长度为 O(n)
的数组需要时间 n
。但我们可以通过摊销分析看到这一点。对于每个元素,我们将归因于检查小于它的可能值的成本、某些后续元素检查它是否应该被删除的成本以及添加的成本。这说明了所有操作(尽管这不是它们发生的顺序),并且每个元素的工作量是固定的。
另请注意,这部分解决方案所需的内存在
O(n)
范围内。
到目前为止,我认为这是一个众所周知的动态规划问题。现在让我们让它更具挑战性。
我们将把分区问题当作传统的动态规划问题来解决。我们将构建一个数组
best_weight
表示该点的最佳分区,以及 prev_index
表示在该点之前结束的上一个分区的开始位置。
为了构建它,我们将使用上述算法获取先前的分区并向其中添加
min_len
之一。如果它比之前的更好,我们会将其信息保存在这些数组中。然后,我们将从该分区向前扫描,直到 max_len
。然后我们继续进行下一个可能的分区开始。
完成后,我们将从该代码中找到答案。
看起来像这样:
def partition_array(numbers, min_len, max_len):
if max_len < min_len or len(numbers) < min_len:
return (None, None)
best_weight = [None for _ in numbers]
prev_index = [None for _ in numbers]
# Need an extra entry for off of the end of the array.
best_weight.append(None)
prev_index.append(None)
best_weight[0] = 0
for i, min_value, max_value in window_mins_maxes(min_len, numbers):
window_start_weight = best_weight[i - min_len]
if window_start_weight is not None:
j = i
while j - i < max_len - min_len and j < len(numbers):
new_weight = window_start_weight + max_value - min_value
if best_weight[j] is None or best_weight[j] < new_weight:
best_weight[j] = new_weight
prev_index[j] = i - min_len
if numbers[j] < min_value:
min_value = numbers[j]
if max_value < numbers[j]:
max_value = numbers[j]
j += 1
# And fill in the longest value.
new_weight = window_start_weight + max_value - min_value
if best_weight[j] is None or best_weight[j] < new_weight:
best_weight[j] = new_weight
prev_index[j] = i - min_len
if best_weight[-1] is None:
return (None, None)
else:
path = [len(numbers)]
while prev_index[path[-1]] is not None:
path.append(prev_index[path[-1]])
path = list(reversed(path))
partitioned = [numbers[path[i]:path[i+1]] for i in range(len(path)-1)]
return (best_weight[-1], partitioned)
请注意,我们对每个可能的开始和长度都进行
O(1)
工作。这就是时间O((max_len + 1 - min_len)*n)
。我们使用的数据结构的大小都以 O(n)
为界。提供我在评论中承诺的整体效率。
现在让我们测试一下。
print(partition_array([5, 8, 4, 5, 1, 3, 5, 1, 3, 1], 3, 7))
print(partition_array([1, 6, 2, 2, 5, 2, 8, 1, 5, 6], 3, 4))
print(partition_array([5, 8, 4, 5, 1, 3, 5, 1, 3, 1, 2], 4, 5))
输出为:
(12, [[5, 8, 4], [5, 1, 3], [5, 1, 3, 1]])
(16, [[1, 6, 2], [2, 5, 2, 8], [1, 5, 6]])
(None, None)
有没有更省时的解决方案?
是的,有。
btilly的答案表明,问题代码中提出的算法需要随着数组大小呈指数增加时间,可以通过具有几乎线性时间依赖性的动态编程方法来击败。
还有可能做得更好吗?还有进步的空间吗?
让我们从构建以下测试用例开始,其中三个不同的分区选项提供相同的最佳结果,并且整数列表的大小允许可靠地测量所需的计算时间:
from time import perf_counter as T
from random import randint, seed
seed(2); arr=[]
for _ in range(100000): arr.append(randint(1,10000000))
sT=T(); result=partition_array(arr, 49999, 50001); eT=T()
print(eT-sT)
print("btl", result[0], result[1] , end='')
并使用 btilly 提供的代码运行它。
在我的机器上我得到的输出:
0.10844088008161634
btl 19998737 [0, 49999, 100000]
btl(btilly的快捷方式)代码需要在运行Python 3.10.12的Linux Mint 21.2 Xfce Thinkpad笔记本上大约0.1秒得出解决方案,并使用window_mins_maxes()
函数完成大部分工作,因此对其进行了优化使用
deque
模块中的
collections
数据类型。该代码输出最佳总和以及整数列表分区的可能多个解决方案之一。在
window_mins_maxes()
的另一个版本下面,我将其命名为
winMinMaxLargeSize()
,以将其与更简单的
winMinMax()
代码区分开来,该代码在小整数列表上运行速度非常快,但在大整数列表上运行速度很慢:
def winMinMax(size, array):
for i in range(size, len(array)+1):
yield i, min(array[ i-size : i ]), max(array[ i -size : i])
def winMinMaxLargeSize(size, L):
""" Let's optimize min/max calculation by re-using already got values what starts to
make sense probably only at larger sizes (for example greater than 7).
This function moves a "Window" with width equal 'size' over the list 'L' calculating
the min/max values and remembering their list "L' indices. The indices are used then
to skip recalculating of min/max values in case the value "leaving" the "Window" has
no impact on the min/max values because in the range between min and max. The
variable names storing these indices are 'indxMin' and 'indxMax' """
# Lets calculate the min/max values for the first "Window" at the beginning of 'L' to
# have a reference start values for "moving" of the "Window" over the list 'L' :
prevMin, prevMax = min(L[0 : size ]), max(L[0: size])
lenL=len(L)
indxCounterMin=0; indxCounterMax=0
for i in range(size)[::-1] :
if indxCounterMin == 0 and L[i] == prevMin :
indxMin = i
indxCounterMin+= 1
if indxCounterMax == 0 and L[i] == prevMax :
indxMax = i
indxCounterMax+= 1
if indxCounterMax and indxCounterMin : # index positions of min/max are found
break # so we may exit the search loop .
#print(f"i={size=} , {indxMin=}, {indxMax=}, {prevMin=}, {prevMax=}")
yield size, prevMin, prevMax
for i in range(size+1, lenL+1):
if ( i - size - 1 ) == indxMin:
prevMin=min(L[ i - size : i ])
for indx in range( i - size, i )[::-1]:
if L[indx] == prevMin :
indxMin = indx
break
elif ( i - size - 1 ) == indxMax:
prevMax=max(L[ i - size : i ])
for indx in list(range(i - size , i ))[::-1]:
if L[indx] == prevMax :
indxMax = indx
break
else:
if L[ i - 1 ] <= prevMin :
indxMin = i - 1
if L[ i - 1 ] < prevMin :
prevMin = L[ i - 1 ]
elif L[ i - 1 ] >= prevMax :
indxMax = i - 1
if L[ i - 1 ] > prevMax :
prevMax = L[ i - 1 ]
#print(f" > {i=:2d}, {indxMin=}, {indxMax=}, {prevMin=}, {prevMax=}")
yield i, prevMin, prevMax
将“btl”代码中使用的函数替换为上面的函数会得到以下输出:
0.05169699003454298
btl-opt 19998737 [0, 49999, 100000]
哇!通过代码的“btl-opt”优化,可以将时间缩短一半并达到 0.05 秒。
*这是否已经是所能达到的极限,还是还有改进的空间?让我们把动态规划方法放在一边,使用另一种方法,通过“暴力”来解决问题,并检查所有可能的列表分区情况。作为副作用,它会给出多个分区解决方案(如果有的话):
def allArrangementsOfIntegerItemsInLsummingUpTOtargetSum(L, targetSum, n=None, m=None):
if n is None: n = 1
if m is None: m = targetSum
lenL = len(L)
# print(f"{targetSum=}, {L=}, {n=}, {m=}")
Stack = []
# Initialize the Stack with the starting point for each element in L
for i in range(lenL):
currentSum = L[ i ]
path = [ L[ i ] ]
start = 0 # Start from 0 allows revisiting of all items
Stack.append( (currentSum, path, start ) )
while Stack:
currentSum, path, start = Stack.pop()
# Check if the current path meets the criteria
if currentSum == targetSum and n <= len(path) <= m:
yield path
if currentSum > targetSum or len(path) > m:
continue
# ^ - NEXT please: stop exploring this path as it's not valid or complete
# Continue to build the path by adding elements from L, starting from 0 index
for i in range(len(L)): # Change start to 0 if you want to include all permutations
newSum = currentSum + L[ i ]
newPath = path + [ L[ i ] ]
Stack.append((newSum, newPath, 0)) # Start from 0 allows every possibility
# def allArrangementsOfIntegerItemsInLsummingUpTOtargetSum
splitsGenerator = allArrangementsOfIntegerItemsInLsummingUpTOtargetSum
def arrSplit(Arr, m, n) :
targetSum=len(Arr)
L=list(range( m, n+1) )
bestSum = None
for split in splitsGenerator( L, targetSum ) :
#print("###", split)
startIndx = 0
endIndx = 0
currSum = 0
for indx in split :
endIndx += indx
currSum = currSum + max(Arr[startIndx : endIndx ]) - min(Arr[ startIndx : endIndx ])
startIndx=startIndx + endIndx
if bestSum is None:
bestSum = currSum
bestSplit = [ split ]
continue
if bestSum <= currSum :
if bestSum == currSum :
bestSplit.append(split)
else:
bestSum = currSum
bestSplit = [ split ]
return bestSum, bestSplit
targetSum = 10000
from time import perf_counter as T
from random import randint, seed
seed(2); arr=[]
for _ in range(100000): arr.append(randint(1,10000000))
sT=T(); result=arrSplit(arr, 49999, 50001); eT=T()
print(eT-sT)
print("oOo", result[0], result[1], end='')
代码输出:
0.012507450999692082
oOo 19998737 [[50001, 49999], [50000, 50000], [49999, 50001]]
并且击败了已经非常快的代码,我们已经开始了旅程,速度快了十倍,时间为0.01秒。
这还可以进一步提高一个数量级吗?我很确定它可以比这个更进一步,所以仍然有很大的改进空间,并且可以用更快的代码来继续这个旅程。
愿oOo的力量与你同在!