如何对数据集进行逆向工程？

Question

我的老师给了我回归结果，练习是对导致回归的数据集进行逆向工程。然后我们需要对其进行回归并找到完全相同的结果。

Regression result

我设法将系数达到-19.93，但我根本没有得到相同的SE。我不知道该怎么做。我不知道是否应该使用一些公式来连接估计器的 SE 和回归的标准误差（我有一些公式，但我真的不知道在 R 中实现它们的方法）...提前感谢你的帮助！

我的 R 输出：

## Given values
n <- 1592
se_β1 <- 1.47
β1hat <- -19.93

## Create a dummy v aiable
Low_anchor <- c(rep(0, Nc/2), rep(1, Nc/2))

## Formula of standard error of beta 1 (assuming homoskedasticity)
calculate_standard_error <- function(u, Low_anchor) {
  sqrt((1/(n - 2))*sum(u^2)/(n*sd(Low_anchor)^2))
}

## Define initial values of u
u <- rnorm(n)

## Tolerance for convergence
tolerance <- 0.1

## Iteratively adjust u until the standard error matches the target
while (abs(calculate_standard_error(u, Low_anchor) - se_β1) > tolerance) {
  ## Generate new set of values for u from a normal distribution
  u <- rnorm(n)
}

print(u)

## regression
Yc <- -19.93*Low_anchor + u
model1 <- lm(Yc ~ Low_anchor - 1)

## Print the summary of the model
summary(model1)

Answer 1

看起来

Nc

没有定义。我认为你正在调用自变量。这里我将使用

。请注意，这个问题似乎要求您以 y = bx + u 的形式使用（多重）回归情况下的标准误差和系数的属性。你必须知道这一点

如果将 x 乘以某个数字 a，则估计系数将为 b/a。
如果将 u 乘以某个数字 a，估计的标准误差将为 a*s。

有了这个，你可以写一个简单的样子，先调整u，然后调整x。首先我们定义一些初选：

n<- 1592
se_b1 <- 1.47
b1hat <- -19.93

set.seed(2)
x <- rnorm(n)
y_mean <- -19.93*x

# we are going to create a random variable to be the residuals
u <- rnorm(n,0,1)

error <- 1
tol <- 0.01

请注意，这些分布与答案无关。您可以检查 u 的均值是否与我们想要的相差甚远。您还可以检查运行后会发生什么是 <- y_mean + u summary(lm(y ~ x)) The coefficient and the standard error will be different from what you want. How to fix that? Using the two properties we mentioned above.

error <- 1
tol <- 0.01

while (error > tol) {
  # remember that the se_b1 is constructed as the rood of the diagonal of sigma^2 * (X'X)^-1
  
  # determine the matrix of X (assuming there is an intercept here)
  X <- matrix(c(rep(1, n), x), ncol = 2)
  XX_minus_one <- solve(t(X) %*% X)
  
  # so far, we would get a standard deviation os
  present_se <- sqrt(var(u) * XX_minus_one[2,2])
  # this is different.  Let's adjust the residuals to have the desired variance
  u_fitting <- u * se_b1 / present_se
  
  y <- y_mean + u
  reg <- lm(y ~ x)
  
  estimated_b1 <- reg$coefficients[2]
  estimated_se_b1 <- summary(reg)$coefficients[2,2]
  
  error <- max(abs(estimated_se_b1 - se_b1), abs(estimated_b1 - b1hat))
  
  # but now we need to refit the x 
  x <- x *  estimated_b1/b1hat
  u <- u_fitting
}

您可以检查它是否正常工作：

summary(lm(y ~ x))

如何对数据集进行逆向工程？

问题描述投票：0回答：1

1个回答

最新问题

如何对数据集进行逆向工程？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1