涉及矩阵运算和约束的优化帮助

问题描述 投票:4回答:2

我在这一方面还差得很远,所以我希望有人能指出我正确的方向。我认为这是一个优化问题,但是scipy.optimize以及它如何与纸浆配合使我感到困惑。而且,矩阵数学使我感到困惑。因此,这个问题确实让我无所适从。

问题陈述:

我有一个客户数据集。对于每个客户,我可以选择3个选项,也可以不选择任何一个。有4个选项。同样,对于每个客户,我都有一个数字分数,表明每个选择的“好”程度。您可以将这个值想象为the probability of the Choice to create a future sale
# fake data for the internet
data = {'customerid':[101,102,103,104,105,106,107,108,109,110],
        'prob_CHOICEA':[0.00317,0.00629,0.00242,0.00253,0.00421,0.00414,0.00739,0.00549,0.00658,0.00852],
        'prob_CHOICEB':[0.061,0.087,0.055,0.027,0.022,0.094,0.099,0.072,0.018,0.052],
        'prob_CHOICEC':[0.024,0.013,0.091,0.047,0.071,0.077,0.067,0.046,0.077,0.044]
       } 

# Creates pandas DataFrame 
df = pd.DataFrame(data) 
df = df.reset_index(drop=True).set_index(['customerid'])
+------------+--------------+--------------+--------------+
| customerid | prob_CHOICEA | prob_CHOICEB | prob_CHOICEC |
+------------+--------------+--------------+--------------+
|        101 |      0.00317 |        0.061 |        0.024 |
|        102 |      0.00629 |        0.087 |        0.013 |
|        103 |      0.00242 |        0.055 |        0.091 |
|        104 |      0.00253 |        0.027 |        0.047 |
|        105 |      0.00421 |        0.022 |        0.071 |
|        106 |      0.00414 |        0.094 |        0.077 |
|        107 |      0.00739 |        0.099 |        0.067 |
|        108 |      0.00549 |        0.072 |        0.046 |
|        109 |      0.00658 |        0.018 |        0.077 |
|        110 |      0.00852 |        0.052 |        0.044 |
+------------+--------------+--------------+--------------+

我首先为每个客户将这些元素组合成一个数组:

# combine all values into 1 array
list_to_combine = ['prob_CHOICEA', 'prob_CHOICEB','prob_CHOICEC']

df['probs_A_B_C']= df[list_to_combine].values.tolist()

df.drop(list_to_combine, axis=1, inplace=True)
+------------+-------------------------+
| customerid |       probs_A_B_C       |
+------------+-------------------------+
|        101 | [0.00317, 0.061, 0.024] |
|        102 | [0.00629, 0.087, 0.013] |
|        103 | [0.00242, 0.055, 0.091] |
|        104 | [0.00253, 0.027, 0.047] |
|        105 | [0.00421, 0.022, 0.071] |
|        106 | [0.00414, 0.094, 0.077] |
|        107 | [0.00739, 0.099, 0.067] |
|        108 | [0.00549, 0.072, 0.046] |
|        109 | [0.00658, 0.018, 0.077] |
|        110 | [0.00852, 0.052, 0.044] |
+------------+-------------------------+

对于每个客户,我只有四种选择:

choices = [
    [0,0,0],
    [1,0,0],
    [0,1,0],
    [0,0,1]
]

对于每个客户,我想为每个客户选择最佳选择。乍一看,这很容易-只需选择最高的数字即可。但是,一旦我开始添加约束,它就会让我震惊。

例如,如果我想为每个客户选择最佳选择,但约束条件是选择的选择之和= 5,该怎么办>>

+------------+-------------------------+-------------+
| customerid |       probs_A_B_C       | best_choice |
+------------+-------------------------+-------------+
|        101 | [0.00317, 0.061, 0.024] | [0,0,0]     |
|        102 | [0.00629, 0.087, 0.013] | [0,1,0]     |
|        103 | [0.00242, 0.055, 0.091] | [0,0,1]     |
|        104 | [0.00253, 0.027, 0.047] | [0,0,0]     |
|        105 | [0.00421, 0.022, 0.071] | [0,0,0]     |
|        106 | [0.00414, 0.094, 0.077] | [0,1,0]     |
|        107 | [0.00739, 0.099, 0.067] | [0,1,0]     |
|        108 | [0.00549, 0.072, 0.046] | [0,0,0]     |
|        109 | [0.00658, 0.018, 0.077] | [0,0,1]     |
|        110 | [0.00852, 0.052, 0.044] | [0,0,0]     |
+------------+-------------------------+-------------+

我什至都没有弄清楚如何做到这一点,我只是为了说明的目的手动将它盯着。

理想情况下,我想同时添加多个约束:

  • best_choice的总和= N
  • [CHOICEA的总和(best_choice的第一个元素)> = M
  • CHOICEB的总和(best_choice的第二个元素)<= 10
  • 关于从哪里开始的任何想法?

我在这一方面还差得很远,所以我希望有人能指出我正确的方向。我认为这是一个优化问题,但scipy.optimize及其适用性让我感到困惑。

python optimization linear-algebra matrix-multiplication
2个回答
3
投票

您可以使用scipy.optimize.linprog解决此线性优化问题。如文档所述,它需要将边界条件设置为矩阵乘积。有两种边界条件,形式为scipy.optimize.linprog的不等式和等式A @ x <= b。可以对问题建模如下:

  • 结果向量A @ x == b的长度为x,其中N*C是客户数量,N是期权数量;它以线性布局表示每个自定义项的选择:C

3
投票

此问题可以使用线性编程(LP)来解决,但是最困难的部分不是您应该使用LP,它是将您的问题转换为 prob_CHOICEA prob_CHOICEB prob_CHOICEC customerid 101 0.00317 0.061 0.024 102 0.00629 0.087 0.013 103 0.00242 0.055 0.091 104 0.00253 0.027 0.047 105 0.00421 0.022 0.071 106 0.00414 0.094 0.077 107 0.00739 0.099 0.067 108 0.00549 0.072 0.046 109 0.00658 0.018 0.077 110 0.00852 0.052 0.044 con: array([-1.30002675e-11]) fun: -0.3812999999903971 message: 'Optimization terminated successfully.' nit: 7 slack: array([1.00000000e+00, 7.99305067e-11, 1.47325485e-11, 1.00000000e+00, 1.00000000e+00, 2.49527066e-11, 2.42738052e-11, 5.84235438e-10, 4.23596713e-11, 5.77714543e-11, 8.80984175e-12, 1.46305190e-11]) status: 0 success: True x: array([2.89971936e-10, 1.32732722e-11, 6.97732845e-12, 1.00000000e+00, 3.28055311e-10, 5.72702383e-12, 1.80418885e-11, 4.61391860e-12, 1.00000000e+00, 2.01674011e-10, 4.58311340e-12, 1.29599793e-11, 2.95298295e-10, 4.34109315e-12, 1.21776975e-11, 3.39951283e-11, 1.00000000e+00, 2.55262044e-10, 4.94703751e-11, 1.00000000e+00, 1.57932544e-11, 9.99999999e-01, 2.21487598e-11, 1.33679145e-11, 2.30514296e-10, 3.91129933e-12, 1.00000000e+00, 1.00000000e+00, 8.19015577e-12, 1.07293976e-11]) Choices: [[0 0 0] [1 0 0] [0 0 1] [0 0 0] [0 0 0] [0 1 0] [0 1 0] [1 0 0] [0 0 1] [1 0 0]] 问题,我将向您展示如何做到这一点。在继续之前,我将更改您为简化起见而提供的示例数据(由于生成了大量变量),因此,假设我们具有以下输入数据:

LP-optimization

假设输入问题的大小为N,其中N代表选择的数量:

© www.soinside.com 2019 - 2024. All rights reserved.