Python Pulp - 0/1 背包问题 - 如何确保项目(ID)最多被选择一次

问题描述 投票:0回答:1

我有点不适应:我是一名统计学家,主要从事 SAS 和 R 工作。我正在尝试使用 Pulp 在 Python 中编写本质上经典的背包问题的代码。我的数据看起来像这样(这是一个简化版本):

fan.csv 示例数据

id       Fan    Cost    Switch
10001   32.83   3.18    A/B
10003   75.25   4.20    C
10005   71.60   8.79    A/B
10010   24.23   9.99    C/D
10011   98.69   8.88    D
10012   48.81   8.68    C/D

我需要在每个 Switch 类别中选择 1 或 2 个项目(例如 1 个“A”、2 个“D”等)时最大化粉丝,同时保持在成本阈值以下。到目前为止,相当标准。

我的问题是,某些项目可能属于一个或另一个 Switch 类别(请参阅 Switch 值中带有“/”的 obs - 例如,id 10001 可以充当“A”或“B”)。因为我是一名统计学家,所以我认为就 n x k 数据集而言,在这种情况下,我将以长形式设置数据(就好像这些是重复测量数据一样):

id       Fan    Cost    Switch
10001   32.83   3.18    A
10001   32.83   3.18    B
10003   75.25   4.20    C
10005   71.60   8.79    A
10005   71.60   8.79    B
10010   24.23   9.99    C
10010   24.23   9.99    D
10011   98.69   8.88    D
10012   48.81   8.68    C
10012   48.81   8.68    D

但是,我基本上采用了其他人的示例代码来调用 Pulp 优化模型,并根据我的情况进行了调整。 我当前的问题是,我不清楚如何向线性优化模型添加约束,以便每个项目(每个 id)最多只选择一次。这是一个问题,因为我“解压”Switch 变量并列出了具有不同 Switch 值的多个 id。

我的代码出现在下面。在步骤 1 中,我将 fan.csv 数据转换为长格式(我自己编写了此代码)。第 2 步主要是我改编的示例数据。示例代码将数据组织在嵌套字典中,然后调用 Pulp 函数来指定优化模型。

import os
import pandas as pd
from pulp import *
import numpy as np

# Step 1

os.chdir("E:\\")

df01 = pd.read_csv("fan.csv")

df02 = df01[df01['Switch'].str.contains('/')]
df03 = df01[~df01['Switch'].str.contains('/')]

df02[['Swit1','Swit2']] = df02.Switch.str.split("/", expand=True)
df04 = pd.wide_to_long(df02, ["Swit"], i="id", j="Pos")
df04.reset_index(inplace=True)
df04.drop(["Pos"], axis=1, inplace=True)

df05 = pd.concat([df04,df03])
df06 = df05.sort_values(by=['id'])
df06.reset_index(drop=True, inplace=True)

df06.loc[(~df06['Switch'].str.contains('/')), 'Swit'] = df06.Switch

# Step 2    

df07 = df06.groupby(["Swit", "id", "Fan", "Cost"]).agg('count')
df07 = df07.reset_index()

costs = {}
fans = {}

for sth in df07.Swit.unique():
    df07_sth = df07[df07.Swit == sth]
    cost = list(df07_sth[['id', 'Cost']].set_index("id").to_dict().values())[0]
    fan = list(df07_sth[['id', 'Fan']].set_index("id").to_dict().values())[0]
    costs[sth] = cost
    fans[sth] = fan


sth_num_available = {
    "A": 1,
    "B": 1,
    "C": 1,
    "D": 2,
                    }

cost_cap = 25

_vars = {k: LpVariable.dict(k, v, cat='Binary') for k, v in fans.items()}

model1 = LpProblem("Fans", LpMaximize)
fanval = []
costval = []
switch_constraints = []

for k, v in _vars.items():
    costval += lpSum([costs[k][i] * _vars[k][i] for i in v])
    fanval += lpSum([fans[k][i] * _vars[k][i] for i in v])
    model1 += lpSum([_vars[k][i] for i in v]) == sth_num_available[k]

model1 += lpSum(fanval)
model1 += lpSum(costval) <= cost_cap
model1

model1.solve()
  • 需要明确的是,我上面列出的 fan.csv 数据的数据行数太少,以至于 Pulp 发现该问题不可行。这些数据只是本文的示例数据。实际上,我有大约 1,000 行数据,并且数据每天都会变化和更新。

编辑1

我还有 2 个额外的限制要添加。我将其包含在这篇文章中,因为这个问题与最初的文章相关。

扩展后的 fan.csv 示例数据为:

id       Fan    Cost    Switch  Hit Group   Cat
10001   32.83   3.18    A/B       0   G1    C1
10003   75.25   4.20    C         1   G1    C2
10005   71.60   8.79    A/B       0   G1    C3
10010   24.23   9.99    C/D       1   G2    C1
10011   98.69   8.88    D         1   G2    C2
10012   48.81   8.68    C/D       1   G2    C3

新约束1

背包解决方案最多可以包含 n id,编码为 1,用于表示来自任何一组的命中。

我尝试针对此约束调整 Reinderien 的代码。这里 n 设置为 2:

df02 = df01
df02.set_index(["id","Group",], drop=False, inplace=True)
for G, total in df02.Assign.groupby(level='Group').sum().items():
    model1.addConstraint(name='Group' + G + 'Hit', constraint=df02.Hit.dot(total) <= 2)

但我收到此错误:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\Programs\Python\Python311\Lib\site-packages\pandas\core\series.py", line 3015, in dot
    if lvals.shape[0] != rvals.shape[0]:
                         ~~~~~~~~~~~^^^
IndexError: tuple index out of range
>>>

新约束2

此外,背包解决方案必须包含至少 2 个类别(Cat)。

这是我改编的代码:

df04 = df02
df04.reset_index(drop=True, inplace=True)
df04.set_index(["id","Cat",], drop=False, inplace=True)
for category, total in df04.Assign.groupby(level='Cat').sum().items():
    model1.addConstraint(name='Cat_' + category, constraint=total >= 2)

运行没有错误并且似乎可以工作。这是最好的方法(或者至少是一个相当好的方法)吗?

================================================

python linear-programming pulp
1个回答
1
投票

我链接的其他答案使用 scipy.在这里,我演示了 PuLP,因为它就是您所使用的。旁白:

[我主要工作] SAS 和 R

R 有自己的线性程序支持,那么为什么要切换到 Python?无论如何,下面演示的主要要点是多重索引使事情变得简单:

import numpy as np import pandas as pd import pulp df = ( pd.read_fwf('fan.csv') .set_index('id') .rename(columns={'Switch': 'SwitchOptions'}) ) # arbitrary columns, id index, values are switch string fragments exploded_switch = df.SwitchOptions.str.split('/', expand=True) # A, B, C, D switch_values = pd.unique(exploded_switch.stack()) # ABCD columns, id index, values are True or NaN to indicate switch match switch_sparse = pd.DataFrame( ( exploded_switch.values[:, :, np.newaxis] == switch_values[np.newaxis, np.newaxis, :] ).any(axis=1), index=df.index, columns=pd.Index(name='Switch', data=switch_values), ) switch_sparse[~switch_sparse] = np.nan # Multi-index to 10 ID/switch values df = switch_sparse.stack().align(df)[1] # Reasonable assignment variable names; will be unique var_names = ( 'asn_i' + df.index.get_level_values('id').astype(str) + '_s' + df.index.get_level_values('Switch') ).to_series(index=df.index) # Assignment variables, named, binary, with same ID/switch multi-index df['Assign'] = var_names.apply(pulp.LpVariable, cat=pulp.LpBinary) # Indexed total switch availability series switch_avail = pd.Series( index=switch_sparse.columns, data=(1, 1, 1, 2), ) # Maximize total Fan via dot product model = pulp.LpProblem(name='Fans', sense=pulp.LpMaximize) model.objective = df.Assign.dot(df.Fan) model.addConstraint(name='cost_cap', constraint=df.Assign.dot(df.Cost) <= 25) # Constrain assignments by switch availability for switch, total in df.Assign.groupby(level='Switch').sum().items(): model.addConstraint(name='switch_' + switch, constraint=total <= switch_avail[switch]) # Constrain IDs to be assigned exclusively. Only add constraints where more than one switch is possible. id_groups = df.Assign.groupby(level='id') for ID, total in id_groups.sum()[id_groups.count() > 1].items(): model.addConstraint(name=f'excl_id_{ID}', constraint=total <= 1) print(model) model.solve() assert model.status == pulp.LpStatusOptimal df['Assign'] = df.Assign.apply(pulp.LpVariable.value) print(df)
Fans:
MAXIMIZE
32.83*asn_i10001_sA + 32.83*asn_i10001_sB + 75.25*asn_i10003_sC + 71.6*asn_i10005_sA + 71.6*asn_i10005_sB + 24.23*asn_i10010_sC + 24.23*asn_i10010_sD + 98.69*asn_i10011_sD + 48.81*asn_i10012_sC + 48.81*asn_i10012_sD + 0.0
SUBJECT TO
cost_cap: 3.18 asn_i10001_sA + 3.18 asn_i10001_sB + 4.2 asn_i10003_sC
 + 8.79 asn_i10005_sA + 8.79 asn_i10005_sB + 9.99 asn_i10010_sC
 + 9.99 asn_i10010_sD + 8.88 asn_i10011_sD + 8.68 asn_i10012_sC
 + 8.68 asn_i10012_sD <= 25

switch_A: asn_i10001_sA + asn_i10005_sA <= 1

switch_B: asn_i10001_sB + asn_i10005_sB <= 1

switch_C: asn_i10003_sC + asn_i10010_sC + asn_i10012_sC <= 1

switch_D: asn_i10010_sD + asn_i10011_sD + asn_i10012_sD <= 2

excl_id_10001: asn_i10001_sA + asn_i10001_sB <= 1

excl_id_10005: asn_i10005_sA + asn_i10005_sB <= 1

excl_id_10010: asn_i10010_sC + asn_i10010_sD <= 1

excl_id_10012: asn_i10012_sC + asn_i10012_sD <= 1

VARIABLES
0 <= asn_i10001_sA <= 1 Integer
0 <= asn_i10001_sB <= 1 Integer
0 <= asn_i10003_sC <= 1 Integer
0 <= asn_i10005_sA <= 1 Integer
0 <= asn_i10005_sB <= 1 Integer
0 <= asn_i10010_sC <= 1 Integer
0 <= asn_i10010_sD <= 1 Integer
0 <= asn_i10011_sD <= 1 Integer
0 <= asn_i10012_sC <= 1 Integer
0 <= asn_i10012_sD <= 1 Integer

Welcome to the CBC MILP Solver 
Version: 2.10.3 
Build Date: Dec 15 2019 

At line 2 NAME          MODEL
At line 3 ROWS
At line 14 COLUMNS
At line 73 RHS
At line 83 BOUNDS
At line 94 ENDATA
Problem MODEL has 9 rows, 10 columns and 28 elements
Coin0008I MODEL read with 0 errors
Option for timeMode changed from cpu to elapsed
Continuous objective value is 277.963 - 0.00 seconds
Cgl0004I processed model has 9 rows, 10 columns (10 integer (10 of which binary)) and 28 elements
Cutoff increment increased from 1e-05 to 0.00999
Cbc0038I Initial state - 1 integers unsatisfied sum - 0.00568828
Cbc0038I Pass   1: suminf.    0.00563 (1) obj. -277.814 iterations 3
Cbc0038I Solution found of -179.68
...

Result - Optimal solution found

Objective value:                255.58000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.00
Time (Wallclock seconds):       0.00

Option for printingOptions changed from normal to all
Total time (CPU seconds):       0.00   (Wallclock seconds):       0.00

                Fan  Cost SwitchOptions  Assign
id    Switch                                   
10001 A       32.83  3.18           A/B     1.0
      B       32.83  3.18           A/B     0.0
10003 C       75.25  4.20             C     1.0
10005 A       71.60  8.79           A/B     0.0
      B       71.60  8.79           A/B     0.0
10010 C       24.23  9.99           C/D     0.0
      D       24.23  9.99           C/D     0.0
10011 D       98.69  8.88             D     1.0
10012 C       48.81  8.68           C/D     0.0
      D       48.81  8.68           C/D     1.0
    
© www.soinside.com 2019 - 2024. All rights reserved.