calc.relimp 在使用线性模型时会产生关于有限值的错误

问题描述 投票:0回答:1

这是一个简化的数据集。

data <- structure(list(group = c("Unedited", "Partial_promoter", "Promoter_and_ATG", 
"ATG", "Promoter", "Unedited", "Partial_promoter", "Promoter_and_ATG", 
"ATG", "Promoter", "Unedited", "Partial_promoter", "Unedited", 
"Partial_promoter", "Promoter_and_ATG", "ATG", "Promoter", "Unedited", 
"Partial_promoter", "Promoter_and_ATG"), day = c(6, 6, 6, 6, 
6, 10, 10, 10, 10, 10, 13, 13, 6, 6, 6, 6, 6, 10, 10, 10), x = c(114.243333333333, 
114.41, 113.426666666667, 113.46, 114.463333333333, 114.473333333333, 
115.453333333333, 113.426666666667, 114.373333333333, 114.37, 
115.276666666667, 114.136666666667, 114.243333333333, 114.463333333333, 
114.476666666667, 113.493333333333, 114.603333333333, 114.51, 
115.496666666667, 113.52)), row.names = c(NA, -20L), class = "data.frame")

我的线性模型是这样的:

model <- lm(x ~ group + day, data = data)
summary(model)

然后想要计算变量的相对重要性,像这样:

library(relaimpo)
calc.relimp(model,
            type=c("lmg","last","first","pratt"),
            rela=TRUE)

但是我收到这个错误。这是什么意思以及如何让它工作?

Error in cov.wt(y, wt = wt) : 'x' must contain finite values only
r lm
1个回答
0
投票

发现的问题是

calc.relimp
无法计算因子变量 group 的协方差。

看起来

calc.relimp
并没有一种自动方法来将因素指标集合在一起。

下面我发布了一个解决方案,使用

model.matrix
“手动”创建因子变量并将它们分组在一起。

relaimpo::calc.relimp(x ~ groupPartial_promoter + groupPromoter + 
   groupPromoter_and_ATG + groupUnedited + day, 
   data = data |> model.matrix(~ ., data = _) |> as.data.frame(), 
   groups = 2:5)

Response variable: x 
Total response variance: 0.39512 
Analysis based on 20 observations 

5 Regressors: 
Some regressors combined in groups: 
        Group  G1 : groupPartial_promoter groupPromoter groupPromoter_and_ATG groupUnedited 

 Relative importance of 2 (groups of) regressors assessed: 
 G1 day 
 
Proportion of variance explained by model: 54.61%
Metrics are not normalized (rela=FALSE). 

Relative importance metrics: 

           lmg
G1  0.47353455
day 0.07252308

Average coefficients for different model sizes: 

                           1group     2groups
groupPartial_promoter  1.01644444  0.93185772
groupPromoter          0.70333333  0.70333333
groupPromoter_and_ATG -0.06305556 -0.09689024
groupUnedited          0.77377778  0.68919106
day                    0.08195230  0.05075203

这与通过类似软件(

{domir}
包;披露 - 我是这个包的作者)获得的值相同,该软件旨在扩展
relaimpo
。它有一个更复杂的整体界面,但考虑到它的结构,它确实将因素变量分组在一起。

domir::domir(
   formula(model), 
   \(fml) {lm(fml, data = data) |> 
      summary() |> 
      _[["r.squared"]] }, 
   .cdl = FALSE, .cpt = FALSE)

Overall Value:      0.5460576 

General Dominance Values:
      General Dominance Standardized Ranks
group        0.47353455    0.8671879     1
day          0.07252308    0.1328121     2
© www.soinside.com 2019 - 2024. All rights reserved.