我使用
selgmented
包中的 segmented
函数来选择时间序列回归中的断点数量。该函数的行为符合预期并提供了一个 segmented
对象。但是,如果我使用 segmented
函数运行相同的数据集,并将 npsi
设置为由 selgmented
确定的断点数量,有时会返回不同的结果(即不同的段斜率和不同的断点)。我尝试使用 seg.control
修改一些参数,但通常无法在两个函数之间获得相同的结果。我想知道这些函数之间有什么不同以及为什么它们会产生不同的结果。通常这些差异很小,但我想了解它们发生的原因。
我提供了一个使用分段包中包含的植物器官数据集的示例。
library(segmented)
data(plant)
plant_RKV <- plant[which(plant$group=="RKV"), ]
out.lm <- lm(y ~ time, data=plant_RKV)
os <- selgmented(out.lm, type="bic", Kmax=5)
o <- segmented(out.lm, npsi=4)
> summary(os)
***Regression Model with Segmented Relationship(s)***
Call:
segmented.lm(obj = olm, seg.Z = seg.Z, psi = startpsi[[i - 1]],
control = control1)
Estimated Break-Point(s):
Est. St.Err
psi1.time 314.495 13.909
psi2.time 471.338 15.939
psi3.time 542.613 9.392
psi4.time 565.911 7.278
Meaningful coefficients of the linear terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0694477 0.0281773 2.465 0.022 *
time 0.0024848 0.0001249 19.895 1.49e-15 ***
U1.time -0.0015098 0.0002187 -6.903 NA
U2.time -0.0030369 0.0011887 -2.555 NA
U3.time 0.0056947 0.0022973 2.479 NA
U4.time -0.0043988 0.0019883 -2.212 NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.02418 on 22 degrees of freedom
Multiple R-Squared: 0.9856, Adjusted R-squared: 0.9798
Convergence attained in 3 iterations (rel. change 2.3914e-10)
> summary(o)
***Regression Model with Segmented Relationship(s)***
Call:
segmented.lm(obj = out.lm, npsi = 4)
Estimated Break-Point(s):
Est. St.Err
psi1.time 295.723 16.654
psi2.time 458.156 11.044
psi3.time 544.410 8.446
psi4.time 566.120 7.659
Meaningful coefficients of the linear terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0497251 0.0359206 1.384 0.18
time 0.0025961 0.0001788 14.522 9.4e-13 ***
U1.time -0.0014525 0.0002439 -5.956 NA
U2.time -0.0027260 0.0006178 -4.412 NA
U3.time 0.0050112 0.0020535 2.440 NA
U4.time -0.0041930 0.0019796 -2.118 NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.02407 on 22 degrees of freedom
Multiple R-Squared: 0.9858, Adjusted R-squared: 0.9799
Boot restarting based on 6 samples. Last fit:
Convergence *not* attained in 1 iterations (rel. change 0.062135)
如果我设置引导样本的数量和最大迭代次数,那么我可以获得对齐的输出。
out.lm <- lm(y ~ time, data=data_plant)
os <- selgmented(out.lm, type="bic", Kmax=5, control=seg.control(n.boot=1000, it.max = 5))
o <- segmented(out.lm, npsi=4, control=seg.control(n.boot=1000, it.max = 5))
> summary(os)
***Regression Model with Segmented Relationship(s)***
Call:
segmented.lm(obj = olm, seg.Z = seg.Z, psi = startpsi[[i - 1]],
control = control1)
Estimated Break-Point(s):
Est. St.Err
psi1.time 300.220 14.015
psi2.time 469.728 15.680
psi3.time 542.613 9.245
psi4.time 565.911 7.164
Meaningful coefficients of the linear terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0519970 0.0307320 1.692 0.105
time 0.0025830 0.0001437 17.971 1.23e-14 ***
U1.time -0.0015229 0.0002118 -7.190 NA
U2.time -0.0031220 0.0011670 -2.675 NA
U3.time 0.0056947 0.0022612 2.518 NA
U4.time -0.0043988 0.0019571 -2.248 NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0238 on 22 degrees of freedom
Multiple R-Squared: 0.9861, Adjusted R-squared: 0.9804
Convergence *not* attained in 5 iterations (rel. change 0.00043136)
> summary(o)
***Regression Model with Segmented Relationship(s)***
Call:
segmented.lm(obj = out.lm, npsi = 4, control = seg.control(n.boot = 1000,
it.max = 5))
Estimated Break-Point(s):
Est. St.Err
psi1.time 300.220 14.015
psi2.time 469.728 15.680
psi3.time 542.613 9.245
psi4.time 565.911 7.164
Meaningful coefficients of the linear terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0519970 0.0307320 1.692 0.105
time 0.0025830 0.0001437 17.971 1.23e-14 ***
U1.time -0.0015229 0.0002118 -7.190 NA
U2.time -0.0031220 0.0011670 -2.675 NA
U3.time 0.0056947 0.0022612 2.518 NA
U4.time -0.0043988 0.0019571 -2.248 NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0238 on 22 degrees of freedom
Multiple R-Squared: 0.9861, Adjusted R-squared: 0.9804
Boot restarting based on 12 samples. Last fit:
Convergence attained in 4 iterations (rel. change 2.2993e-12)
但是,使用
seg.control
更改参数似乎并不适用于所有示例。部分原因可能是由于我不熟悉这些参数的具体情况。
这是我自己的数据集(时间序列)中的第二个示例,其中两个函数之间的断点明显不同。我无法修改参数以使输出对齐。
x <- c(0, 43, 156, 238, 254, 323, 674, 870, 1193, 1555, 1926, 2279, 2660, 3103, 3479, 3832, 4214, 4929, 5665, 6401, 7122, 7850, 9314, 10062, 11151, 11869)
y <- c(7.31322039, 7.32974969, 6.85224257, 7.11882625, 6.56526497, 6.13122649, 6.30991828, 5.88610403, 4.78749174, 3.76120012, 4.02535169, 4.09434456, 2.23216263, 1.97685495, 2.86789890, 2.80336038, 1.28093385, 0.53062825, 0.83290912, 1.02961942, 0.64185389, 0.18232156, 0.09531018, 0.09531018, 0.33647224, 0.51282363)
out.lm2 <- lm(y ~ x)
os2 <- selgmented(out.lm2, type="bic", Kmax=5)
o2 <- segmented(out.lm2, npsi=2)
par(mfrow=c(1,2))
plot.segmented(os2)
plot.segmented(os2, res=TRUE, add=TRUE, conf.level=0.95, shade=TRUE)
plot.segmented(o2)
plot.segmented(o2, res=TRUE, add=TRUE, conf.level=0.95, shade=TRUE)
我很感激任何有关
segmented
和 selgmented
函数的输出为何不同以及如何修改(如果可能)的反馈。
您最终对此有任何见解吗? 我有同样的问题。我通过分段获得了更好的结果(也就是说,它们更接近我通过查看和了解数据所期望的结果),但我无法证明它比分段使用更合理。