这是Stata中的回归
reg infl infoprov access staff int_cso int_cor pom if sample==1 & io==4, robust
我尝试在R中创建子集,但它给了我一个错误。
我创建此子集
df <- data.frame(mydata$sample,mydata$io)
df <- df[df$mydata.sample==1,]
df <- df[df$mydata.io==4,]
然后像这样运行模型
fit <- lm(formula=infl~infoprov+access+staff+int_cso+int_cor+pom, data = mydata, subset=df)
但是它给我错误:“ [.default
(xj,i)中的错误:下标类型'列表'无效”
我不知道如何在R中转换此回归。
首先,我将生成一些看起来像您正在使用的假数据。
param_names <- c("reg", "infl", "infoprov", "access", "staff", "int_cso", "int_cor", "pom")
df <- as.data.frame(matrix(rnorm(8*100), ncol = 8))
colnames(df) <- param_names
# add the other features
df$sample <- rep(1:2, times = 50)
df$io <- rep(1:5, times = 20)
head(df)
这将输出以下内容:
reg infl infoprov access staff int_cso int_cor pom
1 -0.2922592 0.29097494 -0.73331666 0.75946656 2.44818216 0.65244427 0.329806421 1
2 -2.1695251 -0.58344632 0.43216716 -0.15600468 0.96002319 0.29077908 1.593328669 2
3 -0.3870336 -1.39246020 -1.20484785 -1.06913522 -0.04627672 0.91358438 1.574942955 1
4 -0.3377740 0.85376788 0.11650113 -0.07933244 -1.10061878 0.53843822 -0.004072669 2
5 1.8874445 0.06600537 0.09860354 -0.46329217 -0.56318327 0.79697252 -0.316331730 1
6 0.6693864 0.50164076 0.60372298 -0.40350599 -0.40697103 -0.04865879 -0.207381196 2
> param_names <- c("reg", "infl", "infoprov", "access", "staff", "int_cso", "int_cor", "pom")
> df <- as.data.frame(matrix(rnorm(8*100), ncol = 8))
> colnames(df) <- param_names
> df$sample <- rep(1:2, times = 50)
> df$io <- rep(1:5, times = 25)
Error in `$<-.data.frame`(`*tmp*`, io, value = c(1L, 2L, 3L, 4L, 5L, 1L, :
replacement has 125 rows, data has 100
> head(df)
reg infl infoprov access staff int_cso int_cor
1 -0.99241450 -0.1410155 0.1263028 1.85004931 -0.875000485 1.68249914 -0.41617040
2 0.36195391 -1.4262807 -0.6918764 1.27094860 0.943553888 -0.88935698 -0.29580728
3 -0.03744011 0.9701812 -0.8399298 -1.52706348 -1.663372016 0.06849402 0.94382547
4 -1.50628547 0.4746796 -0.8827392 0.04912427 -0.008277577 -0.51145104 0.07805638
5 0.95119030 0.8774569 -1.2075175 -0.22077499 -2.348684232 0.17159598 2.30274484
6 1.09793252 0.1770926 -1.8031436 1.57929431 0.152630323 0.99637941 1.35516155
pom sample
1 -1.6251715 1
2 0.1397944 2
3 1.4852868 1
4 0.8936914 2
5 0.5139919 1
6 0.3069162 2
> df$io <- rep(1:5, times = 20)
> head(df)
reg infl infoprov access staff int_cso int_cor
1 -0.99241450 -0.1410155 0.1263028 1.85004931 -0.875000485 1.68249914 -0.41617040
2 0.36195391 -1.4262807 -0.6918764 1.27094860 0.943553888 -0.88935698 -0.29580728
3 -0.03744011 0.9701812 -0.8399298 -1.52706348 -1.663372016 0.06849402 0.94382547
4 -1.50628547 0.4746796 -0.8827392 0.04912427 -0.008277577 -0.51145104 0.07805638
5 0.95119030 0.8774569 -1.2075175 -0.22077499 -2.348684232 0.17159598 2.30274484
6 1.09793252 0.1770926 -1.8031436 1.57929431 0.152630323 0.99637941 1.35516155
pom sample io
1 -1.6251715 1 1
2 0.1397944 2 2
3 1.4852868 1 3
4 0.8936914 2 4
5 0.5139919 1 5
6 0.3069162 2 1
我认为这与您正在使用的数据类似,但是应该足够接近以显示下一步。
我们可以从普通最小二乘回归开始。这里的一个关键点是subset
函数应该在lm
函数内部用作data
参数的一部分,如下所示。编写方式上,用于拟合模型的数据将是df data.frame,其中sample等于1,io等于4。
fit <- lm(infl~infoprov+access+staff+int_cso+int_cor+pom, data = subset(df, sample == 1 & io ==4))
summary(fit)
这将输出以下内容:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.039153 0.386925 -0.101 0.926
infoprov 0.212023 0.389932 0.544 0.624
access -0.006026 0.248638 -0.024 0.982
staff 0.458822 0.376343 1.219 0.310
int_cso -0.946274 0.727029 -1.302 0.284
int_cor 0.030236 0.505336 0.060 0.956
pom 0.209700 0.550614 0.381 0.729
您提到了稳健的回归。为了进行稳健的回归,我们可以使用MASS
包,如下所示:
# Robust Regression
library(MASS)
fit_robust <-rlm(infl~infoprov+access+staff+int_cso+int_cor+pom, data = subset(df, sample == 1 & io ==4))
summary(fit_robust)
注意以下标准错误:
Coefficients:
Value Std. Error t value
(Intercept) 0.0593 0.3741 0.1586
infoprov 0.3140 0.3770 0.8328
access -0.0138 0.2404 -0.0574
staff 0.4189 0.3638 1.1512
int_cso -1.0356 0.7029 -1.4734
int_cor -0.0245 0.4885 -0.0501
pom 0.2463 0.5323 0.4627