适合的模型功能超出了定义的数据范围

问题描述 投票:-1回答:1

假设我有一组数据(x=times,y=observation),这些数据在时间上有多个间隔。无论数据趋势如何,在此讨论中我们都将其假定为线性。在时间间隔内,会出现衰减,使数据偏离纯粹的线性趋势,直到再次开始观察并恢复线性趋势为止。 时间上有多个间隔,但是在此示例中,我仅报告了最短的快照来说明问题。时间间隔是没有可用观测值的(正)线性趋势之间的时间,因此连续x=times之间的差(例如)比平均值大很多。我想将衰减建模为函数(y_decay = C -D*x

的一部分
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

def f(x, A, B, C, D):
    line = A*x + B if ((x>=1) & (x<=3) | (x>=5) & (x<=9) | (x>=23) & (x<=25)) else C-D*x
    return line

x=[1,2,3, 12,13,14, 23,24,25]
y=[2,4,6, 5, 7, 9, 8, 10,12]
popt, pcov = curve_fit(f, x, y) 

figure = plt.figure(figsize=(5.15, 5.15))
figure.clf()
plot = plt.subplot(111)
ax1 = plt.gca()

plot.scatter(x,y)
plt.show()

enter image description here

如何将decay变量建模为函数的一部分并获得其最佳拟合值?

python function scipy data-fitting
1个回答
0
投票

[假设所有数据具有相同的斜率m而所有“衰减”斜率D时,这就是我的Ansatz

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import leastsq

def generic_data( m, D, n ):
    alpha0 = 0
    timel = [ 0 ] ### to avoid errer, remove at the end
    dl = list()
    for gaps in range( n + 1 ): 
        for pnts in range( 3 + np.random.randint( 2 ) ): 
            timel.append ( timel[-1] +  0.5 + np.random.rand() )
            dl.append( m * timel[ -1 ] + alpha0 )
        ###now the gap
        dt =  2. + 2 * np.random.rand()
        timel.append ( timel[-1] + dt )
        dl.append( dl[-1] - D * dt )
        alpha0 = dl[-1] - m * timel[-1]
    del timel[0]
    ### remove jump of last gap
    del timel[-1]
    del dl[-1]
    dl = np.fromiter( ( y + np.random.normal( scale=0.1 ) for y in dl ), np.float )
    return np.array( timel ), dl

def split_into_blocks( tl, dl ):
    mask = np.where(dl[1::] - dl[:-1:] < 0, 1, 0 )
    where = np.argwhere( mask )
    where = where.reshape( 1, len( where ) )[0]
    where = np.append( where, len( dl ) - 1 )
    where = np.insert( where, 0, -1 )
    tll = list()
    dll = list()
    for s, e in zip( where[ :-1:], where[ 1:: ] ):
        dll.append( dl[ s + 1 : e + 1 ] )
        tll.append( tl[ s + 1 : e + 1 ] )
    return np.array( tll ), np.array( dll )



def residuals( params, tblocks, dblocks ):
    nob = len( tblocks )
    m = params[0]
    D = params[1]
    alphal = params[2:2+nob]
    betal = params[-nob+1::]
    out = list()
    for i, (tl, dl) in enumerate( zip(tblocks, dblocks ) ):
        diff = [ d - ( m * t + alphal[i] ) for t, d in zip( tl, dl ) ]
        out= out + diff
    for j in range( nob -1 ):
        # ~print j
        out.append( dblocks[ j][-1] - ( betal[j] + D * tblocks[j][-1] ) ) ###left point gap
        out.append( dblocks[ j+1][0] - ( betal[j] + D * tblocks[j+1][0] ) ) ###right point gap
    # ~print out
    return out


tl, dl =  generic_data( 1.3, .3, 3 )
tll, dll = split_into_blocks( tl, dl )

nob = len(dll)
m0 = +1.0
D0 = -0.1
guess = [m0, D0 ]+ nob * [-3] + ( nob - 1 ) * [ +4 ]
sol, err = leastsq( residuals, x0=guess, args=( tll, dll ) )


mf = sol[0]
Df = sol[1]

print mf, Df
alphal = sol[2:2+nob]
betal = sol[-nob+1::]

fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )

###original data
ax.plot( tl, dl, ls='', marker='o')
###identify blocks
for a,b in zip( tll, dll ):
    ax.plot( a, b, ls='', marker='x')
###fit results
for j in range(nob):
    tloc = np.linspace( tll[j][0] - 0.3, tll[j][-1] + 0.3 , 3 )
    ax.plot( tloc, [ mf * t + alphal[j] for t in tloc ] )
for j in range(nob - 1):
    tloc = np.linspace( tll[j][-1] - 0.3, tll[j+1][0] + 0.3 , 3 )
    ax.plot( tloc, [ Df * t +betal[j] for t in tloc ] )
plt.show()

此结果在

>> 1.2864142170851447 -0.2818180721270913

some fitted test data

但是,该模型可能要求衰减线在数据范围内不与数据线交叉。由于我们需要检查某种类型的边界,因此这需要额外的摆弄。另一方面,可以只拟合数据并以最小的斜率满足前面提到的边界来寻找衰减曲线。因此,在这种情况下,我将从残差中删除D拟合部分,然后再进行计算。

© www.soinside.com 2019 - 2024. All rights reserved.