我需要找到用于在R中运行Stata的这种精确回归的代码。它具有if条件

问题描述 投票:-3回答:1

这是Stata中的回归

reg infl infoprov access staff int_cso int_cor pom if sample==1 & io==4, robust

我尝试在R中创建子集,但它给了我一个错误。

我创建此子集

df <- data.frame(mydata$sample,mydata$io)

df <- df[df$mydata.sample==1,]

df <- df[df$mydata.io==4,]

然后像这样运行模型

fit <- lm(formula=infl~infoprov+access+staff+int_cso+int_cor+pom, data = mydata, subset=df)

但是它给我错误:“ [.default(xj,i)中的错误:下标类型'列表'无效”

我不知道如何在R中转换此回归。

r if-statement linear-regression stata
1个回答
0
投票

首先,我将生成一些看起来像您正在使用的假数据。

param_names <- c("reg", "infl", "infoprov", "access", "staff", "int_cso", "int_cor", "pom")

df <- as.data.frame(matrix(rnorm(8*100), ncol = 8))

colnames(df) <- param_names

# add the other features

df$sample <- rep(1:2, times = 50)
df$io <- rep(1:5, times = 20)

head(df)

这将输出以下内容:


         reg        infl    infoprov      access       staff     int_cso      int_cor pom
1 -0.2922592  0.29097494 -0.73331666  0.75946656  2.44818216  0.65244427  0.329806421   1
2 -2.1695251 -0.58344632  0.43216716 -0.15600468  0.96002319  0.29077908  1.593328669   2
3 -0.3870336 -1.39246020 -1.20484785 -1.06913522 -0.04627672  0.91358438  1.574942955   1
4 -0.3377740  0.85376788  0.11650113 -0.07933244 -1.10061878  0.53843822 -0.004072669   2
5  1.8874445  0.06600537  0.09860354 -0.46329217 -0.56318327  0.79697252 -0.316331730   1
6  0.6693864  0.50164076  0.60372298 -0.40350599 -0.40697103 -0.04865879 -0.207381196   2
> param_names <- c("reg", "infl", "infoprov", "access", "staff", "int_cso", "int_cor", "pom")
> df <- as.data.frame(matrix(rnorm(8*100), ncol = 8))
> colnames(df) <- param_names
> df$sample <- rep(1:2, times = 50)
> df$io <- rep(1:5, times = 25)
Error in `$<-.data.frame`(`*tmp*`, io, value = c(1L, 2L, 3L, 4L, 5L, 1L,  : 
  replacement has 125 rows, data has 100
> head(df)
          reg       infl   infoprov      access        staff     int_cso     int_cor
1 -0.99241450 -0.1410155  0.1263028  1.85004931 -0.875000485  1.68249914 -0.41617040
2  0.36195391 -1.4262807 -0.6918764  1.27094860  0.943553888 -0.88935698 -0.29580728
3 -0.03744011  0.9701812 -0.8399298 -1.52706348 -1.663372016  0.06849402  0.94382547
4 -1.50628547  0.4746796 -0.8827392  0.04912427 -0.008277577 -0.51145104  0.07805638
5  0.95119030  0.8774569 -1.2075175 -0.22077499 -2.348684232  0.17159598  2.30274484
6  1.09793252  0.1770926 -1.8031436  1.57929431  0.152630323  0.99637941  1.35516155
         pom sample
1 -1.6251715      1
2  0.1397944      2
3  1.4852868      1
4  0.8936914      2
5  0.5139919      1
6  0.3069162      2
> df$io <- rep(1:5, times = 20)
> head(df)
          reg       infl   infoprov      access        staff     int_cso     int_cor
1 -0.99241450 -0.1410155  0.1263028  1.85004931 -0.875000485  1.68249914 -0.41617040
2  0.36195391 -1.4262807 -0.6918764  1.27094860  0.943553888 -0.88935698 -0.29580728
3 -0.03744011  0.9701812 -0.8399298 -1.52706348 -1.663372016  0.06849402  0.94382547
4 -1.50628547  0.4746796 -0.8827392  0.04912427 -0.008277577 -0.51145104  0.07805638
5  0.95119030  0.8774569 -1.2075175 -0.22077499 -2.348684232  0.17159598  2.30274484
6  1.09793252  0.1770926 -1.8031436  1.57929431  0.152630323  0.99637941  1.35516155
         pom sample io
1 -1.6251715      1  1
2  0.1397944      2  2
3  1.4852868      1  3
4  0.8936914      2  4
5  0.5139919      1  5
6  0.3069162      2  1

我认为这与您正在使用的数据类似,但是应该足够接近以显示下一步。

我们可以从普通最小二乘回归开始。这里的一个关键点是subset函数应该在lm函数内部用作data参数的一部分,如下所示。编写方式上,用于拟合模型的数据将是df data.frame,其中sample等于1,io等于4。

fit <- lm(infl~infoprov+access+staff+int_cso+int_cor+pom, data = subset(df, sample == 1 & io ==4))

summary(fit)

这将输出以下内容:

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.039153   0.386925  -0.101    0.926
infoprov     0.212023   0.389932   0.544    0.624
access      -0.006026   0.248638  -0.024    0.982
staff        0.458822   0.376343   1.219    0.310
int_cso     -0.946274   0.727029  -1.302    0.284
int_cor      0.030236   0.505336   0.060    0.956
pom          0.209700   0.550614   0.381    0.729

您提到了稳健的回归。为了进行稳健的回归,我们可以使用MASS包,如下所示:

# Robust Regression
library(MASS)
fit_robust <-rlm(infl~infoprov+access+staff+int_cso+int_cor+pom, data = subset(df, sample == 1 & io ==4))

summary(fit_robust)

注意以下标准错误:

Coefficients:
            Value   Std. Error t value
(Intercept)  0.0593  0.3741     0.1586
infoprov     0.3140  0.3770     0.8328
access      -0.0138  0.2404    -0.0574
staff        0.4189  0.3638     1.1512
int_cso     -1.0356  0.7029    -1.4734
int_cor     -0.0245  0.4885    -0.0501
pom          0.2463  0.5323     0.4627
© www.soinside.com 2019 - 2024. All rights reserved.