从glm()中删除完全分开的观察结果

问题描述 投票:1回答:1

我正在使用HMDA软件包中的AER数据进行一些探索性数据分析;但是,我用来拟合模型的变量似乎包含一些可以完美确定结果的观察结果,这一问题称为“分离”。因此,我尝试使用this thread推荐的解决方案对此进行补救,但是当我尝试从glm.fit()执行第一组源代码时,R返回了一条错误消息:

Error in family$family : object of type 'closure' is not subsettable

因此,我无法继续使用此代码从我的数据中删除那些完全确定的观察结果。我想知道是否有人可以帮助我解决此问题?

下面提供了我当前的代码供您参考。

# load the AER package and HMDA data
library(AER)
data(HMDA)

# fit a 2-degree olynomial probit model 
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial, data = HMDA)

# using the revised source code from that stackexchage thread to find out observations that received a warning message
library(tidyverse)
library(dplyr)
library(broom)

eps <- 10 * .Machine$double.eps
if (family$family == "binomial") {
  if (any(mu > 1 - eps) || any(mu < eps)) 
    warning("glm.fit: fitted probabilities numerically 0 or 1 occurred", 
            call. = FALSE)
}

# this return the following error message
# Error in family$family : object of type 'closure' is not subsettable


probit.resids <- augment(probit.fit) %>%
  mutate(p = 1 / (1 + exp(-.fitted)),
         warning = p > 1-eps)

arrange(probit.resids, desc(.fitted)) %>%  
  select(2:5, p, warning) %>% 
  slice(1:10)


HMDA.nwarning <- filter(HMDA, !probit.resids$warning)

# using HMDA.nwarning should solve the problem...
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial, data = HMDA.nwarning)
r subset logistic-regression glm
1个回答
1
投票

这段代码

if (family$family == "binomial") {
  if (any(mu > 1 - eps) || any(mu < eps)) 
    warning("glm.fit: fitted probabilities numerically 0 or 1 occurred", 
            call. = FALSE)
}

存在一个函数,当您使用family ==“ binomial”运行glm时会调用binomial()。如果查看glm(只需键入glm):

if (is.character(family)) 
        family <- get(family, mode = "function", envir = parent.frame())
    if (is.function(family)) 
        family <- family()
    if (is.null(family$family)) {
        print(family)
        stop("'family' not recognized")
    }

并且glm函数在拟合过程中检查binomial()$ family,并且如果任何预测值与eps的1或0都不相同,则会引发该警告。

您不需要运行该部分,是的,您需要设置eps <- 10 * .Machine$double.eps。因此,让我们运行下面的代码,如果要运行一个概率,则需要以二项式指定link="probit",否则默认为logit:

library(AER)
library(tidyverse)
library(dplyr)
library(broom)

data(HMDA)

probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial(link="probit"), data = HMDA)

eps <- 10 * .Machine$double.eps

probit.resids <- augment(probit.fit) %>%
  mutate(p = 1 / (1 + exp(-.fitted)),
         warning = p > 1-eps)

列警告指示观察值是否发出警告,在此数据集中,有一个:

table(probit.resids$warning)

FALSE  TRUE 
 2379     1

我们可以使用下一步进行过滤

HMDA.nwarning <- filter(HMDA, !probit.resids$warning)
dim(HMDA.nwarning)
[1] 2379   14

并重新运行回归:

probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial(link="probit"), data = HMDA.nwarning)
coefficients(probit.fit)
(Intercept) poly(hirat, 2)1 poly(hirat, 2)2 
      -1.191292        8.708494        6.884404
© www.soinside.com 2019 - 2024. All rights reserved.