我发布这篇文章的目标是整合我的其他帖子here,其中要求我提供一个最小的工作示例,但也询问有关代码本身的建议,以防有人可以建议更好/更快/更可靠的方法方法。因此,单独的帖子(与可重复性帖子不在同一范围内)。
您可以想象这个概念的几个用例。
例如。您可以有多个文件要处理,每个文件都有不同数量的记录,并且您希望将它们的处理分布在 N 个并行行之间,确保每行有或多或少相同的要处理的记录总数。
或者,您可能有几个不同重量的物体,需要 N 个人将它们从 A 搬运到 B,并且您希望确保每个人搬运的重量大致相同。
请参阅下面的我的代码,基于线性规划(本质上是一个分配问题加上绝对差之和的最小化)。
由于某种原因,当使用
threads
> 1 运行时,它会在不同的运行中给出不同的结果,如上面的帖子所示。任何改进/替代方法的建议仍然提供绝对最优解决方案(不贪婪),我们将不胜感激。
# Take a list of numbers and partition them into a desired number of sets,
# ensuring that the sum within each set is as close as possible
# to the sum of all numbers divided by the number of sets.
# USER SETTINGS
list_of_numbers = [ 21614, 22716, 1344708, 8948, 136944, 819, 7109,
255182, 556354, 1898763, 1239808, 925193, 173237, 64301,
147896, 824564, 16028, 1021326, 108042, 72221, 368270,
17467, 2953, 52942, 1855, 739627, 460833, 30955]
N_sets_to_make = 4
import numpy as np
import pulp
from pulp import *
# Calculate the desired size of each set
S = N_sets_to_make
average_size = sum(list_of_numbers) / S
sizes = [average_size] * S
# Create the coefficient matrix
N = len(list_of_numbers)
A = np.array(list_of_numbers, ndmin = 2)
# Create the pulp model
prob = LpProblem("Partitioning", LpMinimize)
# Create the pulp variables
# x_names : binary variables encoding the presence of each initial number in each final set
x_names = ['x_'+str(i) for i in range(N * S)]
x = [LpVariable(x_names[i], lowBound = 0, upBound = 1, cat = 'Integer') for i in range(N * S)]
# X_names : continuous positive variables encoding the absolute difference between each final set sum and the desired size
X_names = ['X_'+str(i) for i in range(S)]
X = [LpVariable(X_names[i], lowBound = 0, cat = 'Continuous') for i in range(S)]
# Add the objective to the model (mimimal sum of X_i)
prob += LpAffineExpression([(X[i], 1) for i in range(S) ])
# Add the constraints to the model
# Constraints forcing each initial number to be in one and only one final set
for c in range(N):
prob += LpAffineExpression([(x[c+m*N],+1) for m in range(S)]) == 1
# Constraints forcing each final set to be non-empty
for m in range(S):
prob += LpAffineExpression([(x[i],+1) for i in range(m,(m+1)*N)]) >= 1
# Constraints encoding the absolute values
for m in range(S):
cs = [c for c in range(N) if A[0,c] != 0]
prob += LpAffineExpression([(x[c+m*N],A[0,c]) for c in cs]) - X[m] <= sizes[m]
prob += LpAffineExpression([(x[c+m*N],A[0,c]) for c in cs]) + X[m] >= sizes[m]
# Solve the model
prob.solve(PULP_CBC_CMD(gapRel = 0, timeLimit = 3600, threads = 1))
# Extract the solution
values_of_x_i = [value(x[i]) for i in range(N * S)]
selected_ind_initial_numbers = [(list(range(N)) * S)[i] for i,l in enumerate(values_of_x_i) if l == 1]
selected_ind_final_sets = [(list((1 + np.repeat(range(S), N)).astype('int64')))[i] for i,l in enumerate(values_of_x_i) if l == 1]
ind_final_set_for_each_initial_number = [x for _, x in sorted(zip(selected_ind_initial_numbers, selected_ind_final_sets))]
# Find the numbers that ended up in each final set
d = dict()
for m, n in sorted(zip(ind_final_set_for_each_initial_number, list_of_numbers)) :
if m in d :
d[m].append(n)
else :
d[m] = [n]
print(d)
# And their sums
s = [sum(l) for i, l in enumerate(d.values())]
print(s)
# And the absolute differences of their sums from the desired sum
absdiffs = [np.abs(s[i] - sizes[i]) for i in range(len(s))]
print(absdiffs)
# And the absolute fractional differences
print([absdiffs[i]/sizes[i]/S for i in range(len(absdiffs))])
首先,认识到你的玩具问题组合起来相当庞大。
选项数量的very粗略下限是:
c(28, 14) * c(14, 7) * c(7,6) * 1
超过1e+11。真实数字可能还要高出几个数量级。
稍微减小模型大小的一种策略是采用最小最大策略,在该策略中最小化最大集合的最大大小,这取决于您如何定义“比较总和”,这可能没问题。
此 (a) 从您的公式中删除了
S-1
变量,并大幅折叠了约束矩阵。您的绝对值部分减少了 1/2,并且每个约束中只有 1 个变量,而不是 S
变量。
这样做,并将 RelGap 更改为合理,您将获得几乎瞬时的性能。当 RelGap 为 0.0001 时,它可以立即求解。如果您必须解决 100% 最优性保证,那么您仍然需要很长时间,因为问题的组合量级和底层整数结构。
我的写法略有不同,但还不足以担心。平面索引和 numpy 不是必需的。为了清楚起见,我只是对它进行双重索引,而不是像现在一样使用仿射表达式,因为模型会立即构建,并且更容易阅读。 (有关示例,请参阅我的其他
pulp
帖子)
# USER SETTINGS
list_of_numbers = [ 21614, 22716, 1344708, 8948, 136944, 819, 7109,
255182, 556354, 1898763, 1239808, 925193, 173237, 64301,
147896, 824564, 16028, 1021326, 108042, 72221, 368270,
17467, 2953, 52942, 1855, 739627, 460833, 30955]
N_sets_to_make = 4
import numpy as np
import pulp
from pulp import *
# Calculate the desired size of each set
S = N_sets_to_make
average_size = sum(list_of_numbers) / S
sizes = [average_size] * S
# Create the coefficient matrix
N = len(list_of_numbers)
A = np.array(list_of_numbers, ndmin = 2)
# Create the pulp model
prob = LpProblem("Partitioning", LpMinimize)
# Create the pulp variables
# x_names : binary variables encoding the presence of each initial number in each final set
x_names = ['x_'+str(i) for i in range(N * S)]
x = [LpVariable(x_names[i], lowBound = 0, upBound = 1, cat = 'Integer') for i in range(N * S)]
# X_names : continuous positive variables encoding the absolute difference between each final set sum and the desired size
X_names = ['X_'+str(i) for i in range(S)]
# X = [LpVariable(X_names[i], lowBound = 0, cat = 'Continuous') for i in range(S)]
max_sum = LpVariable(name='max_sum', cat='Continuous')
# Add the objective to the model (mimimal sum of X_i)
prob += LpAffineExpression(max_sum)
# Add the constraints to the model
# Constraints forcing each initial number to be in one and only one final set
for c in range(N):
prob += LpAffineExpression([(x[c+m*N],+1) for m in range(S)]) == 1
# Constraints forcing each final set to be non-empty
for m in range(S):
prob += LpAffineExpression([(x[i],+1) for i in range(m,(m+1)*N)]) >= 1
# Constraints encoding the absolute values
for m in range(S):
cs = [c for c in range(N) if A[0,c] != 0]
prob += LpAffineExpression([(x[c+m*N],A[0,c]) for c in cs]) <= max_sum
# prob += LpAffineExpression([(x[c+m*N],A[0,c]) for c in cs]) + X[m] >= sizes[m]
# Solve the model
prob.solve(PULP_CBC_CMD(gapRel = 0.0001, timeLimit = 3600, threads = 1))
# Extract the solution
values_of_x_i = [value(x[i]) for i in range(N * S)]
selected_ind_initial_numbers = [(list(range(N)) * S)[i] for i,l in enumerate(values_of_x_i) if l == 1]
selected_ind_final_sets = [(list((1 + np.repeat(range(S), N)).astype('int64')))[i] for i,l in enumerate(values_of_x_i) if l == 1]
ind_final_set_for_each_initial_number = [x for _, x in sorted(zip(selected_ind_initial_numbers, selected_ind_final_sets))]
# Find the numbers that ended up in each final set
d = dict()
for m, n in sorted(zip(ind_final_set_for_each_initial_number, list_of_numbers)) :
if m in d :
d[m].append(n)
else :
d[m] = [n]
print(d)
# And their sums
s = [sum(l) for i, l in enumerate(d.values())]
print(s)
# And the absolute differences of their sums from the desired sum
absdiffs = [np.abs(s[i] - sizes[i]) for i in range(len(s))]
print(absdiffs)
# And the absolute fractional differences
print([absdiffs[i]/sizes[i]/S for i in range(len(absdiffs))])