我尝试拟合三次模型并将其用于 R 中的预测,虽然提供的预测是准确的,但考虑到预测值,模型的系数没有任何意义。
下面是我运行时的输出
summary(lm(Blur ~ poly(logMAR, 3), data = df))
:
Call:
lm(formula = Blur ~ poly(logMAR, 3), data = df)
Residuals:
Min 1Q Median 3Q Max
-20.7838 -2.0661 -0.0868 2.3009 23.8918
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.0000 0.7775 45.015 <2e-16 ***
poly(logMAR, 3)1 230.6529 6.7782 34.029 <2e-16 ***
poly(logMAR, 3)2 94.3878 6.7782 13.925 <2e-16 ***
poly(logMAR, 3)3 19.6777 6.7782 2.903 0.0049 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.778 on 72 degrees of freedom
Multiple R-squared: 0.9497, Adjusted R-squared: 0.9476
F-statistic: 453.4 on 3 and 72 DF, p-value: < 2.2e-16
随后调用
coef(model)
时,返回:
(Intercept) poly(logMAR, 3)1 poly(logMAR, 3)2 poly(logMAR, 3)3
35.00000 230.65294 94.38777 19.67767
总结准确,但是当我调用
predict(model, newdata = data.frame(logMAR = c(0.37, 0.74, 1.15))
时,它返回:
1 2 3
3.492585 11.469869 31.099493
但是,如果我根据系数手动计算预测,答案将是:
[1] 134.2600 265.3438 455.0060
经过与图的各种比较,较小的输出更有意义,并且 R 总结的系数似乎太大,使得手动计算不准确。我使用替代软件和在线工具对模型进行了健全性检查,结果返回较小的系数 β0 = 1.3415、β1 = 0.9073、β2 = 9.2517 和 β3 = 10.8354。
使用较小系数的手动计算与 R 中的
predict()
函数一致。
1 2 3
3.492585 11.469869 31.099493
R的系数那么大是有原因的吗
summary()
,是我手动计算错误还是看错地方了?
正如评论中所说,
poly
默认使用正交多项式,not只是a0 + a1 * x + a2 * x ^ 2 + a3 * x ^ 3
。但是,您可以要求原始多项式。
以下玩具示例说明了该问题:
set.seed(123)
d <- data.frame(x = runif(100))
d$y <- (d$x + 4) ^ 3 + rnorm(100, sd = .2)
mo <- lm(y ~ poly(x, degree = 3), data = d)
mr <- lm(y ~ poly(x, degree = 3, raw = TRUE), data = d)
bd <- data.frame(x = 2)
predict(mo, bd)
predict(mr, bd) ## same
sum(coef(mr) * 2 ^ (0:3)) ## raw polys can be interpreted like this
# [1] 220.7312
sum(coef(mo) * 2 ^ (0:3)) ## however, orthogonal ones not --> wrong
[1] 480.2738
sum(coef(mo) * c(1, predict(poly(d$x, degree = 3), 2))) ## but you can use them this way
# [1] 220.7312