如何在 R 上对数据集进行逆向工程?

问题描述 投票:0回答:1

我的老师给了我回归结果,练习是对导致回归的数据集进行逆向工程。然后我们需要对其进行回归并找到完全相同的结果。

Regression result

我设法将系数达到-19.93,但我根本没有得到相同的SE。我不知道该怎么做。我不知道是否应该使用一些公式来连接估计器的 SE 和回归的标准误差(我有一些,但我真的不知道在 R 中实现它们的方法)...提前感谢你的帮助!

我的 R 输出:

# Given values
n<- 1592
se_β1 <- 1.47
β1hat <- -19.93

# Create a dummy variable
Low_anchor <- c(rep(0, Nc/2), rep(1, Nc/2))

# Formula of standard error of beta 1 (assuming homoskedasticity)
calculate_standard_error <- function(u, Low_anchor) {
  sqrt((1/(n-2)) * sum(u^2) / (n*sd(Low_anchor)^2))}

# Define initial values of u
u <- rnorm(n)

# Tolerance for convergence
tolerance <- 0.1

# Iteratively adjust u until the standard error matches the target
while (abs(calculate_standard_error(u, Low_anchor) - se_β1) > tolerance) {
 
   # Generate new set of values for u from a normal distribution
  u <- rnorm(n)}

print(u)

# regression
Yc <- -19.93 *Low_anchor + u
model1 <- lm(Yc ~ Low_anchor - 1)

# Print the summary of the model
summary(model1)
r regression linear-regression standard-error
1个回答
0
投票

看起来

Nc
没有定义。我认为你正在调用自变量。这里我将使用
x
。请注意,这个问题似乎要求您使用 y = bx + u 形式的(多重)回归情况下的标准误差和系数的属性。你必须知道这一点

  1. 如果将 x 乘以某个数字 a,则估计系数将为 b/a。
  2. 如果将 u 乘以某个数字 a,估计的标准误差将为 a*s。

有了这个,你可以写一个简单的样子,先调整u,然后调整x。首先我们定义一些初选:

n<- 1592
se_b1 <- 1.47
b1hat <- -19.93

set.seed(2)
x <- rnorm(n)
y_mean <- -19.93*x

# we are going to create a random variable to be the residuals
u <- rnorm(n,0,1)

error <- 1
tol <- 0.01

请注意,这些分布与答案无关。您可以检查 u 的均值是否与我们想要的相差甚远。您还可以检查运行后会发生什么 是 <- y_mean + u summary(lm(y ~ x)) The coefficient and the standard error will be different from what you want. How to fix that? Using the two properties we mentioned above.

error <- 1
tol <- 0.01

while (error > tol) {
  # remember that the se_b1 is constructed as the rood of the diagonal of sigma^2 * (X'X)^-1
  
  # determine the matrix of X (assuming there is an intercept here)
  X <- matrix(c(rep(1, n), x), ncol = 2)
  XX_minus_one <- solve(t(X) %*% X)
  
  # so far, we would get a standard deviation os
  present_se <- sqrt(var(u) * XX_minus_one[2,2])
  # this is different.  Let's adjust the residuals to have the desired variance
  u_fitting <- u * se_b1 / present_se
  
  y <- y_mean + u
  reg <- lm(y ~ x)
  
  estimated_b1 <- reg$coefficients[2]
  estimated_se_b1 <- summary(reg)$coefficients[2,2]
  
  error <- max(abs(estimated_se_b1 - se_b1), abs(estimated_b1 - b1hat))
  
  # but now we need to refit the x 
  x <- x *  estimated_b1/b1hat
  u <- u_fitting
}

您可以检查它是否正常工作:

summary(lm(y ~ x))
© www.soinside.com 2019 - 2024. All rights reserved.