我遇到了回归问题。如图所示,橙色线是真实情况,蓝色散点是我的数据。数据的总体趋势与真实情况相符,但存在大量噪声。如何通过回归从我的数据中得到一条与真实情况基本一致的曲线?我尝试记录数据,然后进行线性回归。它与真实情况非常不同,甚至与我当前的蓝色散点也非常不同。这是我的数据之一:
array([173.4098 , 175.45181 , 174.13388 , 173.40126 , 168.39598 ,
170.03275 , 174.5293 , 165.87642 , 159.7338 , 161.6138 ,
162.9032 , 163.47513 , 154.69208 , 158.92336 , 154.58157 ,
150.18645 , 152.07083 , 150.4797 , 151.29477 , 148.55183 ,
146.464 , 143.84012 , 145.89365 , 142.02222 , 144.87402 ,
139.96799 , 142.3692 , 137.01068 , 142.52402 , 132.98807 ,
137.32303 , 132.90698 , 133.54742 , 130.06049 , 130.66891 ,
130.84998 , 129.88948 , 123.85362 , 125.86339 , 129.53204 ,
128.61484 , 128.87492 , 126.015274, 123.903114, 120.53897 ,
120.38696 , 129.81078 , 120.591125, 119.701645, 119.92349 ,
120.14763 , 119.11101 , 120.00702 , 117.86407 , 116.26706 ,
115.46516 , 112.11573 , 113.74388 , 111.47281 , 114.65326 ,
109.15923 , 111.74715 , 110.95357 , 111.46296 , 109.9637 ,
109.58853 , 108.28537 , 111.840836, 107.205475, 111.05708 ,
107.724075, 109.72452 , 106.84272 , 105.18547 , 103.491745,
107.05888 , 103.77411 , 99.423706, 102.43909 , 101.53308 ,
102.69588 , 108.018585, 103.53029 , 99.62952 , 104.83856 ,
104.23057 , 101.18348 , 102.52391 , 104.558334, 98.90404 ,
101.16083 , 97.97317 , 95.47827 , 100.85654 , 102.93936 ,
98.681854, 97.37257 , 97.05141 , 92.266624, 98.8342 ,
94.678894, 92.807495, 92.73536 , 95.94114 , 95.84711 ,
94.92062 , 97.35336 , 92.18617 , 87.458984, 92.3882 ,
95.487915, 97.04467 , 93.190155, 91.25882 , 96.17412 ,
92.43962 , 94.880844, 91.82003 , 95.34397 , 92.99954 ],
dtype=float32)
您可以使用 scipy.optimize 中的 curve_fit 。我尝试了
y=a.exp(-b x^n)
形式的指数衰减曲线,并搜索参数 a
、b
、n
。
如果您可以在参数上指定一些初始搜索范围,效果会更好。
随意定制您想要的贴合功能。
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func( X, A, B, n ):
return A * np.exp( - B * X ** n )
x = np.linspace( 1.0, 120.0, 120 ) # You don't state this, so I'm guessing
y = np.array( [173.4098 , 175.45181 , 174.13388 , 173.40126 , 168.39598 ,
170.03275 , 174.5293 , 165.87642 , 159.7338 , 161.6138 ,
162.9032 , 163.47513 , 154.69208 , 158.92336 , 154.58157 ,
150.18645 , 152.07083 , 150.4797 , 151.29477 , 148.55183 ,
146.464 , 143.84012 , 145.89365 , 142.02222 , 144.87402 ,
139.96799 , 142.3692 , 137.01068 , 142.52402 , 132.98807 ,
137.32303 , 132.90698 , 133.54742 , 130.06049 , 130.66891 ,
130.84998 , 129.88948 , 123.85362 , 125.86339 , 129.53204 ,
128.61484 , 128.87492 , 126.015274, 123.903114, 120.53897 ,
120.38696 , 129.81078 , 120.591125, 119.701645, 119.92349 ,
120.14763 , 119.11101 , 120.00702 , 117.86407 , 116.26706 ,
115.46516 , 112.11573 , 113.74388 , 111.47281 , 114.65326 ,
109.15923 , 111.74715 , 110.95357 , 111.46296 , 109.9637 ,
109.58853 , 108.28537 , 111.840836, 107.205475, 111.05708 ,
107.724075, 109.72452 , 106.84272 , 105.18547 , 103.491745,
107.05888 , 103.77411 , 99.423706, 102.43909 , 101.53308 ,
102.69588 , 108.018585, 103.53029 , 99.62952 , 104.83856 ,
104.23057 , 101.18348 , 102.52391 , 104.558334, 98.90404 ,
101.16083 , 97.97317 , 95.47827 , 100.85654 , 102.93936 ,
98.681854, 97.37257 , 97.05141 , 92.266624, 98.8342 ,
94.678894, 92.807495, 92.73536 , 95.94114 , 95.84711 ,
94.92062 , 97.35336 , 92.18617 , 87.458984, 92.3882 ,
95.487915, 97.04467 , 93.190155, 91.25882 , 96.17412 ,
92.43962 , 94.880844, 91.82003 , 95.34397 , 92.99954 ] )
params, cv = curve_fit( func, x, y, bounds = ( ( 100.0, 0.0, 0.0 ), ( 300.0, 1.0, 2.0 ) ) )
a, b, n = params
print( "Fit: y = a.exp( -b x^n ) where [a, b, n] =", *params )
# plot the results
xfit = np.linspace( min( x ), max( x ), 100 )
yfit = func( xfit, *params )
plt.plot( x, y, 'bo', label="data" )
plt.plot( xfit, yfit, 'k-', label="fitted")
plt.show()
输出:
Fit: y = a.exp( -b x^n ) where [a, b, n] = 192.12413638168618 0.047421045318959514 0.5818668562501786