这是一个简化的数据集。
data <- structure(list(group = c("Unedited", "Partial_promoter", "Promoter_and_ATG",
"ATG", "Promoter", "Unedited", "Partial_promoter", "Promoter_and_ATG",
"ATG", "Promoter", "Unedited", "Partial_promoter", "Unedited",
"Partial_promoter", "Promoter_and_ATG", "ATG", "Promoter", "Unedited",
"Partial_promoter", "Promoter_and_ATG"), day = c(6, 6, 6, 6,
6, 10, 10, 10, 10, 10, 13, 13, 6, 6, 6, 6, 6, 10, 10, 10), x = c(114.243333333333,
114.41, 113.426666666667, 113.46, 114.463333333333, 114.473333333333,
115.453333333333, 113.426666666667, 114.373333333333, 114.37,
115.276666666667, 114.136666666667, 114.243333333333, 114.463333333333,
114.476666666667, 113.493333333333, 114.603333333333, 114.51,
115.496666666667, 113.52)), row.names = c(NA, -20L), class = "data.frame")
我的线性模型是这样的:
model <- lm(x ~ group + day, data = data)
summary(model)
然后想要计算变量的相对重要性,像这样:
library(relaimpo)
calc.relimp(model,
type=c("lmg","last","first","pratt"),
rela=TRUE)
但是我收到这个错误。这是什么意思以及如何让它工作?
Error in cov.wt(y, wt = wt) : 'x' must contain finite values only
发现的问题是
calc.relimp
无法计算因子变量 group 的协方差。
看起来
calc.relimp
并没有一种自动方法来将因素指标集合在一起。
下面我发布了一个解决方案,使用
model.matrix
“手动”创建因子变量并将它们分组在一起。
relaimpo::calc.relimp(x ~ groupPartial_promoter + groupPromoter +
groupPromoter_and_ATG + groupUnedited + day,
data = data |> model.matrix(~ ., data = _) |> as.data.frame(),
groups = 2:5)
Response variable: x
Total response variance: 0.39512
Analysis based on 20 observations
5 Regressors:
Some regressors combined in groups:
Group G1 : groupPartial_promoter groupPromoter groupPromoter_and_ATG groupUnedited
Relative importance of 2 (groups of) regressors assessed:
G1 day
Proportion of variance explained by model: 54.61%
Metrics are not normalized (rela=FALSE).
Relative importance metrics:
lmg
G1 0.47353455
day 0.07252308
Average coefficients for different model sizes:
1group 2groups
groupPartial_promoter 1.01644444 0.93185772
groupPromoter 0.70333333 0.70333333
groupPromoter_and_ATG -0.06305556 -0.09689024
groupUnedited 0.77377778 0.68919106
day 0.08195230 0.05075203
这与通过类似软件(
{domir}
包;披露 - 我是这个包的作者)获得的值相同,该软件旨在扩展relaimpo
。它有一个更复杂的整体界面,但考虑到它的结构,它确实将因素变量分组在一起。
domir::domir(
formula(model),
\(fml) {lm(fml, data = data) |>
summary() |>
_[["r.squared"]] },
.cdl = FALSE, .cpt = FALSE)
Overall Value: 0.5460576
General Dominance Values:
General Dominance Standardized Ranks
group 0.47353455 0.8671879 1
day 0.07252308 0.1328121 2