柯西家族的 GLM

Question

我想用

family=Cauchy()

在 R 上训练 GLM，但我找不到任何方法。看起来它没有任何意义，或者高斯族可以完成这项工作。有人可以解释我该怎么做或为什么我不能吗？

Answer 1

您可以使用 heavy 包（使用重尾分布的稳健估计）。

library(heavy)
data(ereturns)
fit <- heavyLm(m.marietta ~ CRSP, data = ereturns, family = Cauchy())
summary(fit)

Answer 2

您不能使用 GLM 执行此操作的原因有两个（特别是使用

glm()

）。

从狭义上讲，GLM 仅定义为指数族（泊松、二项式、高斯、伽马等）中的条件分布。
更一般地说，您可以将 GLM 扩展到 准似然 模型，这些模型基于特定分布的均值-方差关系。但是 Cauchy 有未定义/无限的均值和方差，所以这是行不通的。

虽然@StéphaneLaurent 建议使用

heavy

包可能是最好的，但您也可以使用一般的最大似然法来做到这一点：

library(bbmle)
fit2 <- mle2(m.marietta ~ dcauchy(mu, exp(logsc)), 
             parameters = list(mu~CRSP), 
             start=list(mu=0, logsc=0), 
             data=ereturns)

coef(fit)

和

coef(fit2)

表明结果基本相同

Answer 3

另一种方法是具有 t 分布误差的贝叶斯回归。

回想一下，Cauchy 分布等同于 nu = 1 自由度的 t 分布（参考 Wikipedia）。

因此，与其将自由度固定为 1，不如将它们与其他模型参数一起估计。

这里是如何使用 brms 包做到这一点。我们将使用 nu 的默认先验，即 Gamma(2, 0.1)。

data(ereturns, package = "heavy")

library("brms")

fit.bayes <- brm(
  m.marietta ~ CRSP,
  family = student,
  data = ereturns,
  seed = 1234
)
prior_summary(fit.bayes)
#>                 prior     class coef group resp dpar nlpar lb ub       source
#>                (flat)         b                                       default
#>                (flat)         b CRSP                             (vectorized)
#>  student_t(3, 0, 2.5) Intercept                                       default
#>         gamma(2, 0.1)        nu                             1         default
#>  student_t(3, 0, 2.5)     sigma                             0         default

贝叶斯分析的结果是，即使分布族是重尾的，自由度的后验估计在3到4之间，而不是1（后验均值= 4.4，后验模数= 3.3）。

summary(fit.bayes)
#>  Family: student 
#>   Links: mu = identity; sigma = identity; nu = identity 
#> Formula: m.marietta ~ CRSP 
#>    Data: ereturns (Number of observations: 60) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Population-Level Effects: 
#>           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept    -0.01      0.01    -0.02     0.01 1.00     3552     2674
#> CRSP          1.30      0.21     0.90     1.73 1.00     3267     2322
#> 
#> Family Specific Parameters: 
#>       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma     0.06      0.01     0.04     0.08 1.00     2885     3281
#> nu        4.39      2.28     1.96     9.90 1.00     3435     3017
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

mode <- function(x) {
  z <- density(x) 
  z$x[which.max(z$y)] 
}

as_draws_df(fit.bayes) %>%
  summarise(
    across(nu, list(nu_posterior_mean = mean, posterior_mode = mode))
  )
#> # A tibble: 1 × 2
#>   nu_mean nu_posterior_mode
#>     <dbl>             <dbl>
#> 1    4.39              3.35

为了比较，这里是@StéphaneLaurent 建议的 heavy 包的估计值。

library("heavy")

fit.heavy <- heavyLm(m.marietta ~ CRSP, data = ereturns, family = Cauchy())
summary(fit.heavy)
#> Linear model under heavy-tailed distributions
#>  Data: ereturns; Family: Cauchy() 
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -0.135080 -0.032849  0.003565  0.044033  0.560639 
#> 
#> Coefficients:
#>              Estimate Std.Error Z value p-value
#> (Intercept) -0.0096   0.0072   -1.3378  0.1810
#> CRSP         1.1638   0.1670    6.9669  0.0000
#> 
#> Degrees of freedom: 60 total; 58 residual
#> Scale estimate: 0.001479587
#> Log-likelihood: 66.42185 on 3 degrees of freedom

^{创建于 2023-04-07 与 reprex v2.0.2}

柯西家族的 GLM

问题描述投票：0回答：3

3个回答

最新问题

柯西家族的 GLM

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3