将正数列表分配到所需数量的集合中,旨在使它们之间的总和尽可能接近

问题描述 投票:0回答:1

我发布这篇文章的目标是整合我的其他帖子here,其中要求我提供一个最小的工作示例,但也询问有关代码本身的建议,以防有人可以建议更好/更快/更可靠的方法方法。因此,单独的帖子(与可重复性帖子不在同一范围内)。

您可以想象这个概念的几个用例。
例如。您可以有多个文件要处理,每个文件都有不同数量的记录,并且您希望将它们的处理分布在 N 个并行行之间,确保每行有或多或少相同的要处理的记录总数。
或者,您可能有几个不同重量的物体,需要 N 个人将它们从 A 搬运到 B,并且您希望确保每个人搬运的重量大致相同。

请参阅下面的我的代码,基于线性规划(本质上是一个分配问题加上绝对差之和的最小化)。

由于某种原因,当使用

threads
> 1 运行时,它会在不同的运行中给出不同的结果,如上面的帖子所示。
但如果我用单线程运行它会慢得多,所以这是一个 catch-22 情况。

任何改进/替代方法的建议仍然提供绝对最优解决方案(不贪婪),我们将不胜感激。

# Take a list of numbers and partition them into a desired number of sets,
# ensuring that the sum within each set is as close as possible
# to the sum of all numbers divided by the number of sets.

# USER SETTINGS
list_of_numbers = [  21614,   22716, 1344708,    8948,  136944,     819,    7109,
         255182,  556354, 1898763, 1239808,  925193,  173237,   64301,
         147896,  824564,   16028, 1021326,  108042,   72221,  368270,
          17467,    2953,   52942,    1855,  739627,  460833,   30955]
N_sets_to_make = 4

import numpy as np
import pulp
from pulp import *

# Calculate the desired size of each set
S = N_sets_to_make
average_size = sum(list_of_numbers) / S
sizes = [average_size] * S

# Create the coefficient matrix
N = len(list_of_numbers)
A = np.array(list_of_numbers, ndmin = 2)

# Create the pulp model
prob = LpProblem("Partitioning", LpMinimize)

# Create the pulp variables
# x_names : binary variables encoding the presence of each initial number in each final set
x_names = ['x_'+str(i) for i in range(N * S)]
x = [LpVariable(x_names[i], lowBound = 0, upBound = 1, cat = 'Integer') for i in range(N * S)]
# X_names : continuous positive variables encoding the absolute difference between each final set sum and the desired size
X_names = ['X_'+str(i) for i in range(S)]
X = [LpVariable(X_names[i], lowBound = 0, cat = 'Continuous') for i in range(S)]

# Add the objective to the model (mimimal sum of X_i)
prob += LpAffineExpression([(X[i], 1) for i in range(S) ])

# Add the constraints to the model

# Constraints forcing each initial number to be in one and only one final set
for c in range(N):
    prob += LpAffineExpression([(x[c+m*N],+1) for m in range(S)]) == 1

# Constraints forcing each final set to be non-empty
for m in range(S):
    prob += LpAffineExpression([(x[i],+1) for i in range(m,(m+1)*N)]) >= 1

# Constraints encoding the absolute values
for m in range(S):    
        cs = [c for c in range(N) if A[0,c] != 0]
        prob += LpAffineExpression([(x[c+m*N],A[0,c]) for c in cs]) - X[m] <= sizes[m]
        prob += LpAffineExpression([(x[c+m*N],A[0,c]) for c in cs]) + X[m] >= sizes[m]

# Solve the model
prob.solve(PULP_CBC_CMD(gapRel = 0, timeLimit = 3600, threads = 1))

# Extract the solution
values_of_x_i = [value(x[i]) for i in range(N * S)]
selected_ind_initial_numbers = [(list(range(N)) * S)[i] for i,l in enumerate(values_of_x_i) if l == 1]
selected_ind_final_sets = [(list((1 + np.repeat(range(S), N)).astype('int64')))[i] for i,l in enumerate(values_of_x_i) if l == 1]
ind_final_set_for_each_initial_number = [x for _, x in sorted(zip(selected_ind_initial_numbers, selected_ind_final_sets))]

# Find the numbers that ended up in each final set
d = dict()
for m, n in sorted(zip(ind_final_set_for_each_initial_number, list_of_numbers)) :
    if m in d :
        d[m].append(n)
    else :
        d[m] = [n]
print(d)

# And their sums
s = [sum(l) for i, l in enumerate(d.values())]
print(s)

# And the absolute differences of their sums from the desired sum
absdiffs = [np.abs(s[i] - sizes[i]) for i in range(len(s))]
print(absdiffs)

# And the absolute fractional differences
print([absdiffs[i]/sizes[i]/S for i in range(len(absdiffs))])
python optimization partitioning linear-programming pulp
1个回答
0
投票

首先,认识到你的玩具问题组合起来相当庞大。

选项数量的very粗略下限是:

c(28, 14) * c(14, 7) * c(7,6) * 1

超过1e+11。真实数字可能还要高出几个数量级。

稍微减小模型大小的一种策略是采用最小最大策略,在该策略中最小化最大集合的最大大小,这取决于您如何定义“比较总和”,这可能没问题。

此 (a) 从您的公式中删除了

S-1
变量,并大幅折叠了约束矩阵。您的绝对值部分减少了 1/2,并且每个约束中只有 1 个变量,而不是
S
变量。

这样做,并将 RelGap 更改为合理,您将获得几乎瞬时的性能。当 RelGap 为 0.0001 时,它可以立即求解。如果您必须解决 100% 最优性保证,那么您仍然需要很长时间,因为问题的组合量级和底层整数结构。

我的写法略有不同,但还不足以担心。平面索引和 numpy 不是必需的。为了清楚起见,我只是对它进行双重索引,而不是像现在一样使用仿射表达式,因为模型会立即构建,并且更容易阅读。 (有关示例,请参阅我的其他

pulp
帖子)

轻微修改

# USER SETTINGS
list_of_numbers = [  21614,   22716, 1344708,    8948,  136944,     819,    7109,
         255182,  556354, 1898763, 1239808,  925193,  173237,   64301,
         147896,  824564,   16028, 1021326,  108042,   72221,  368270,
          17467,    2953,   52942,    1855,  739627,  460833,   30955]
N_sets_to_make = 4

import numpy as np
import pulp
from pulp import *

# Calculate the desired size of each set
S = N_sets_to_make
average_size = sum(list_of_numbers) / S
sizes = [average_size] * S

# Create the coefficient matrix
N = len(list_of_numbers)
A = np.array(list_of_numbers, ndmin = 2)

# Create the pulp model
prob = LpProblem("Partitioning", LpMinimize)

# Create the pulp variables
# x_names : binary variables encoding the presence of each initial number in each final set
x_names = ['x_'+str(i) for i in range(N * S)]
x = [LpVariable(x_names[i], lowBound = 0, upBound = 1, cat = 'Integer') for i in range(N * S)]
# X_names : continuous positive variables encoding the absolute difference between each final set sum and the desired size
X_names = ['X_'+str(i) for i in range(S)]
# X = [LpVariable(X_names[i], lowBound = 0, cat = 'Continuous') for i in range(S)]
max_sum = LpVariable(name='max_sum', cat='Continuous')
# Add the objective to the model (mimimal sum of X_i)
prob += LpAffineExpression(max_sum)

# Add the constraints to the model

# Constraints forcing each initial number to be in one and only one final set
for c in range(N):
    prob += LpAffineExpression([(x[c+m*N],+1) for m in range(S)]) == 1

# Constraints forcing each final set to be non-empty
for m in range(S):
    prob += LpAffineExpression([(x[i],+1) for i in range(m,(m+1)*N)]) >= 1

# Constraints encoding the absolute values
for m in range(S):
        cs = [c for c in range(N) if A[0,c] != 0]
        prob += LpAffineExpression([(x[c+m*N],A[0,c]) for c in cs]) <= max_sum
        # prob += LpAffineExpression([(x[c+m*N],A[0,c]) for c in cs]) + X[m] >= sizes[m]

# Solve the model
prob.solve(PULP_CBC_CMD(gapRel = 0.0001, timeLimit = 3600, threads = 1))

# Extract the solution
values_of_x_i = [value(x[i]) for i in range(N * S)]
selected_ind_initial_numbers = [(list(range(N)) * S)[i] for i,l in enumerate(values_of_x_i) if l == 1]
selected_ind_final_sets = [(list((1 + np.repeat(range(S), N)).astype('int64')))[i] for i,l in enumerate(values_of_x_i) if l == 1]
ind_final_set_for_each_initial_number = [x for _, x in sorted(zip(selected_ind_initial_numbers, selected_ind_final_sets))]

# Find the numbers that ended up in each final set
d = dict()
for m, n in sorted(zip(ind_final_set_for_each_initial_number, list_of_numbers)) :
    if m in d :
        d[m].append(n)
    else :
        d[m] = [n]
print(d)

# And their sums
s = [sum(l) for i, l in enumerate(d.values())]
print(s)

# And the absolute differences of their sums from the desired sum
absdiffs = [np.abs(s[i] - sizes[i]) for i in range(len(s))]
print(absdiffs)

# And the absolute fractional differences
print([absdiffs[i]/sizes[i]/S for i in range(len(absdiffs))])
© www.soinside.com 2019 - 2024. All rights reserved.