R中回归周围的置信带

问题描述 投票:0回答:1

[我正在尝试围绕R中的线性回归计算置信带。我知道predict在大多​​数情况下都可以做到,但是我想要一个基于方程式的解决方案;部分原因是因为并非所有回归模型(例如deming软件包中的模型)都与predict兼容,部分原因是我想了解它(因此也没有ggplot-解决方案等)。

我使用herehere解释的方程式已经走的很远。

我完成了什么:我成功地计算了t值,MSE和回归的标准误差。我非常相信我的实现中的所有方程式在很大程度上都是正确的,因为当我将它们用于x值的范围时,将计算回归模型,它们完全适合predict返回(请参见代码中的左图)。

复杂的地方:当我尝试推断时,问题就开始了。或换句话说,当我想计算置信度范围超出我有数据的x值范围时。计算出的置信带为仍然正确,但已移位。精确地,如您在下面的右图中看到的,您必须沿着原始数据和用于外推的数据之差沿x轴移动置信带。同样地,在y轴上,必须对x值的各个均值,通过模型的差异来移动置信带。如果难以理解,您可以在下面的代码中的#Define displacement vector下查看计算。

鉴于se的方程式,我不清楚为什么会发生这种位移。但是我想知道是否有比替换我现在实现的置信带更好的解决方案(也是因为由于需要移位,我的代码现在并没有真正计算预期间隔内的置信带)。如果有人可以帮助我完善此代码,我将非常感谢。

#Create data
Dat<-as.data.frame(matrix(c(1, 1, 1, 4, 4, 4, 7, 7, 7, 10, 10, 10, 2.1, 2.3, 2.2, 3.5, 3.1, 3.2, 4.2, 5.0, 4.8, 6.1, 6.6, 6.2), 12, 2))
colnames(Dat)<-c("X", "Y")

#Create linear model
mod<-lm(Y ~ X, data=Dat)

#Use predict to calculate confidence band for comparison
Pred<-predict(mod, newdata=data.frame(X=0:30), interval="confidence")

#Calculate confidence band according to equations
#https://stattrek.com/regression/slope-confidence-interval.aspx
#https://library2.lincoln.ac.nz/documents/Analysing-the-Variance.pdf
##Gather constants
n<-nrow(Dat)
##Define prediction values
Pred.vals<-list()
Pred.vals$S1<-seq(from=1, to=10, by=0.5)
Pred.vals$S2<-seq(from=0, to=30, by=0.5)
Pred.vals$S1.fitted<-coef(mod)[2]*Pred.vals$S1+coef(mod)[1]
Pred.vals$S2.fitted<-coef(mod)[2]*Pred.vals$S2+coef(mod)[1]
##Calculate t-value
t.val<-qt(p=1-((1-0.95)/2), df=n-2)
##Calculate MSE
mse<-sqrt(sum((Dat[,"Y"]-mod$fitted.values)^2)/(n-2))
##Calculate standard error of fit: two versions, both work, but se2 is displaced
se1<-mse*sqrt((1/n)+(Pred.vals$S1-mean(Pred.vals$S1))^2/sum((Dat[,"X"]-mean(Dat[,"X"]))^2))
se2<-mse*sqrt((1/n)+(Pred.vals$S2-mean(Pred.vals$S2))^2/sum((Dat[,"X"]-mean(Dat[,"X"]))^2))

#Define displacement vector
X.Mean<-list()
X.Mean$Original<-mean(Dat[,"X"])
X.Mean$New<-mean(Pred.vals$S2)
X.Mean$X.Diff<-X.Mean$Original-X.Mean$New
X.Mean$Y.Diff<-(coef(mod)[2]*X.Mean$Original+coef(mod)[1])-(coef(mod)[2]*X.Mean$New+coef(mod)[1])

#Calculate confidence band
slope.upper1<-Pred.vals$S1.fitted+t.val*se1
slope.lower1<-Pred.vals$S1.fitted-t.val*se1
slope.upper2<-Pred.vals$S2.fitted+t.val*se2
slope.lower2<-Pred.vals$S2.fitted-t.val*se2

#Plot and compare
win.graph(20, 10, 10)
layout(matrix(1:2, 1, 2))
##Small plot
plot(Dat[,"X"], Dat[,"Y"], xlim=c(0, 11), ylim=c(2, 7))
curve(coef(mod)[2]*x+coef(mod)[1], col="grey50", lwd=1, add=TRUE)
##Confidence interval from "predict"
lines(0:30, Pred[,"lwr"], col="cornflowerblue", lty=2)
lines(0:30, Pred[,"upr"], col="cornflowerblue", lty=2)
##Confidence intervals from equations
lines(Pred.vals$S1, slope.upper1, col="darkgreen", lwd=2, lty=2)
lines(Pred.vals$S1, slope.lower1, col="darkgreen", lwd=2, lty=2)
legend("topleft", col=c("grey50", "cornflowerblue", "darkgreen"), lwd=c(1, 1, 2), lty=c(1, 2, 2), legend=c("Regression line", "Confidence from 'predict'", "Confidence from equations"))

##Large plot
plot(Dat[,"X"], Dat[,"Y"], xlim=c(0, 30), ylim=c(2, 15))
curve(coef(mod)[2]*x+coef(mod)[1], col="grey50", lwd=1, add=TRUE)
##Confidence interval from "predict"
lines(0:30, Pred[,"lwr"], col="cornflowerblue", lty=2)
lines(0:30, Pred[,"upr"], col="cornflowerblue", lty=2)
##Confidence intervals from equations
#lines(Pred.vals$S1, slope.upper1, col="darkgreen", lwd=2, lty=2)
#lines(Pred.vals$S1, slope.lower1, col="darkgreen", lwd=2, lty=2)
lines(Pred.vals$S2, slope.upper2, col="firebrick", lty=3)
lines(Pred.vals$S2, slope.lower2, col="firebrick", lty=3)
lines(Pred.vals$S2+X.Mean$X.Diff, slope.upper2+X.Mean$Y.Diff, col="darkgreen", lwd=2, lty=3)
lines(Pred.vals$S2+X.Mean$X.Diff, slope.lower2+X.Mean$Y.Diff, col="darkgreen", lwd=2, lty=3)
legend("topleft", col=c("grey50", "cornflowerblue", "firebrick", "darkgreen"), lwd=c(1, 1, 1, 2), lty=c(1, 2, 3, 3), legend=c("Regression line", "Confidence from 'predict'", "Confidence from equations", "Confidence from equations (displaced)"))
r regression confidence-interval
1个回答
0
投票

这是一个非常愚蠢的错误,但也许对某人仍然有用。唯一的问题是,在se计算中肯定要在分子中也使用原始 x-数据的平均值。

se1<-mse*sqrt((1/n)+(Pred.vals$S1-mean(Dat[,"X"]))^2/sum((Dat[,"X"]-mean(Dat[,"X"]))^2))
se2<-mse*sqrt((1/n)+(Pred.vals$S2-mean(Dat[,"X"]))^2/sum((Dat[,"X"]-mean(Dat[,"X"]))^2))
© www.soinside.com 2019 - 2024. All rights reserved.