约束线性回归:求方程y = c + m1*x1 + m2*x2 + m3*x3中c,m1,m2,m3的最优值

问题描述 投票:0回答:1

我有一个包含 4 列的数据框。

  1. speed_20-40
  2. speed_40-50
  3. speed_50-60
  4. 燃油效率

speed 列的值以百分比表示,表示司机一天中在每个桶中花费的持续时间百分比

我想在给定以下约束的情况下求解线性回归方程

  1. 拦截 (c) 2
  2. 系数speed_20-40(m1) -1
  3. 系数speed_40-50(m2) 0
  4. 系数speed_50-60(m3) m1

目标函数:sum|(ypred - yact)| (mae) 应该是最小的

由于我对线性规划完全陌生,所以我发现很难制定它。我发现很少有像 pulp 和 pyomo 这样的库,但还没有找到合并数据框的方法。

linear-programming pyomo pulp scipy-optimize-minimize
1个回答
0
投票

不得不猜测一堆,因为不清楚你的问题大小或可变范围是什么,也不清楚“燃油效率”是否是y

我演示了一个既不使用

pyomo
也不使用
pulp
的解决方案,而是使用对 Scipy 的
milp
的基本调用,后者又使用 HiGHS LP 优化器。

import numpy as np
import pandas as pd
from numpy.random import default_rng
from scipy.optimize import milp, LinearConstraint, Bounds
from scipy import sparse

# Synthesize data, arbitrarily long sample set
rand = default_rng(seed=1)
n = 5_000
speed = pd.DataFrame(
    data=rand.uniform(size=(n, 3)),
    columns=('20-40', '40-50', '50-60'),
)
speed = 100 * speed.divide(speed.sum(axis=1), axis=0)  # make balanced percentages
c_ideal = 2.2
m_ideal = [-0.6, 0.9, 0.5]
efficiency_ideal = c_ideal + speed@m_ideal

# Add +/- 5% error to demonstrate behaviour with measured data
noise = rand.uniform(low=-5, high=5, size=n)
efficiency = efficiency_ideal + noise

'''
Set up LP problem
error = [s s s 1][m m m c] - efficiency
To minimize |error|,
error <= slack
-error <= slack
minimize slack.
'''

# m1 m2 m3 c slack:n
c = np.ones(4 + n)
c[:4] = 0  # only slack is minimised

lower = np.full_like(c, -np.inf)
lower[0] = -1  # m1
lower[1] =  0  # m2
#    [2]         m3 no simple bound
lower[3] =  2  # c
# slack: no simple bound

upper = np.full_like(c, np.inf)
upper[0] = 0  # m1
upper[1] = 1  # m2
#    [2]        m3 no simple bound
upper[3] = 3  # c
# slack: no simple bound

# m1s1 + m2s2 + m3s3 + c
A_error = np.empty((n, 4))
A_error[:, :3] = speed  # s1, s2, s3
A_error[:,  3] = 1      # c

A_m_constraints = np.array((
    (-1,  0,  1,  0),  # m1 <= m3
    ( 0,  1, -1,  0),  # m3 <= m2
))

# slack variable used to get the effect of absolute error
A_slack = sparse.eye(n)

A = sparse.hstack((
    sparse.vstack((A_error, -A_error, A_m_constraints)),
    sparse.vstack((A_slack,  A_slack, sparse.dok_matrix((2, n)))),
))

lb = np.zeros(2*n + 2)
lb[ :n      ] =  efficiency  #  efficiency <=  m1s1 + m2s2 + m3s3 + c + slack
lb[n:n*2    ] = -efficiency  # -efficiency <= -m1s1 - m2s2 - m3s3 - c + slack

result = milp(c=c, bounds=Bounds(lb=lower, ub=upper),
              constraints=LinearConstraint(lb=lb, A=A))
print(result.message)
m = result.x[:3]
c = result.x[3]
slack = result.x[4:]
print('m =', m, '~', m_ideal)
print('c =', c, '~', c_ideal)
print(f'Remaining slack: {slack.dot(slack)/n:.2f}')
Optimization terminated successfully. (HiGHS Status 7: Optimal)
m = [-0.5949474   0.89613074  0.50330008] ~ [-0.6, 0.9, 0.5]
c = 2.0 ~ 2.2
Remaining slack: 8.47
© www.soinside.com 2019 - 2024. All rights reserved.