我有一个包含 4 列的数据框。
speed 列的值以百分比表示,表示司机一天中在每个桶中花费的持续时间百分比
我想在给定以下约束的情况下求解线性回归方程
目标函数:sum|(ypred - yact)| (mae) 应该是最小的
由于我对线性规划完全陌生,所以我发现很难制定它。我发现很少有像 pulp 和 pyomo 这样的库,但还没有找到合并数据框的方法。
不得不猜测一堆,因为不清楚你的问题大小或可变范围是什么,也不清楚“燃油效率”是否是y。
我演示了一个既不使用
pyomo
也不使用 pulp
的解决方案,而是使用对 Scipy 的 milp
的基本调用,后者又使用 HiGHS LP 优化器。
import numpy as np
import pandas as pd
from numpy.random import default_rng
from scipy.optimize import milp, LinearConstraint, Bounds
from scipy import sparse
# Synthesize data, arbitrarily long sample set
rand = default_rng(seed=1)
n = 5_000
speed = pd.DataFrame(
data=rand.uniform(size=(n, 3)),
columns=('20-40', '40-50', '50-60'),
)
speed = 100 * speed.divide(speed.sum(axis=1), axis=0) # make balanced percentages
c_ideal = 2.2
m_ideal = [-0.6, 0.9, 0.5]
efficiency_ideal = c_ideal + speed@m_ideal
# Add +/- 5% error to demonstrate behaviour with measured data
noise = rand.uniform(low=-5, high=5, size=n)
efficiency = efficiency_ideal + noise
'''
Set up LP problem
error = [s s s 1][m m m c] - efficiency
To minimize |error|,
error <= slack
-error <= slack
minimize slack.
'''
# m1 m2 m3 c slack:n
c = np.ones(4 + n)
c[:4] = 0 # only slack is minimised
lower = np.full_like(c, -np.inf)
lower[0] = -1 # m1
lower[1] = 0 # m2
# [2] m3 no simple bound
lower[3] = 2 # c
# slack: no simple bound
upper = np.full_like(c, np.inf)
upper[0] = 0 # m1
upper[1] = 1 # m2
# [2] m3 no simple bound
upper[3] = 3 # c
# slack: no simple bound
# m1s1 + m2s2 + m3s3 + c
A_error = np.empty((n, 4))
A_error[:, :3] = speed # s1, s2, s3
A_error[:, 3] = 1 # c
A_m_constraints = np.array((
(-1, 0, 1, 0), # m1 <= m3
( 0, 1, -1, 0), # m3 <= m2
))
# slack variable used to get the effect of absolute error
A_slack = sparse.eye(n)
A = sparse.hstack((
sparse.vstack((A_error, -A_error, A_m_constraints)),
sparse.vstack((A_slack, A_slack, sparse.dok_matrix((2, n)))),
))
lb = np.zeros(2*n + 2)
lb[ :n ] = efficiency # efficiency <= m1s1 + m2s2 + m3s3 + c + slack
lb[n:n*2 ] = -efficiency # -efficiency <= -m1s1 - m2s2 - m3s3 - c + slack
result = milp(c=c, bounds=Bounds(lb=lower, ub=upper),
constraints=LinearConstraint(lb=lb, A=A))
print(result.message)
m = result.x[:3]
c = result.x[3]
slack = result.x[4:]
print('m =', m, '~', m_ideal)
print('c =', c, '~', c_ideal)
print(f'Remaining slack: {slack.dot(slack)/n:.2f}')
Optimization terminated successfully. (HiGHS Status 7: Optimal)
m = [-0.5949474 0.89613074 0.50330008] ~ [-0.6, 0.9, 0.5]
c = 2.0 ~ 2.2
Remaining slack: 8.47