我正在尝试为从数据集子集生成的多个不同线性回归模型生成截距和系数的数据框。不幸的是,我无法分享我的数据,但我可以使用 mtcars 集来解释它。我正在创建一个回归模型,根据每个碳水化合物值的 cyl、hp 和 wt 预测 mpg。 经过一段时间的研究后,我发现许多示例可以满足我的要求,但仅限于具有 1 个预测变量的模型(例如 mpg~wt)。当我添加其他术语时,一切都崩溃了。这是我迄今为止的工作的基础: https://community.rstudio.com/t/extract-slopes-by-group-broom-dplyr/2751/8 从许多线性回归线中提取系数的有效方法
这是我尝试过的
library(tidyverse);library(broom)
df <- mtcars
tryme <- df %>%
split(.$carb)%>%
map(~lm(mpg~cyl+hp+wt,data=.x)) %>%
map_df(tidy)
有了这个结果
term estimate std.error statistic p.value
1 (Intercept) 46.034633 7.68430306 5.99073626 9.31E-03
2 cyl 2.650503624 4.14371413 0.63964442 5.68E-01
3 hp -0.230007961 0.1573354 -1.46189576 2.40E-01
4 wt -5.231999683 9.23136027 -0.56676368 6.11E-01
5 (Intercept) 39.84451509 2.95537984 13.48202845 1.03E-05
6 cyl -0.846094229 0.93995084 -0.90014732 4.03E-01
7 hp -0.007452737 0.03998485 -0.18638901 8.58E-01
8 wt -4.133340298 1.42757472 -2.89535829 2.75E-02
9 (Intercept) 17.50267062 22.13706712 0.79064993 5.74E-01
10 cyl NA NA NA NA
11 hp NA NA NA NA
12 wt -0.3115727 5.73067255 -0.05436931 9.65E-01
13 (Intercept) 45.33390978 12.93999647 3.50339429 1.28E-02
14 cyl -4.195214198 3.492613 -1.20116778 2.75E-01
15 hp 0.029361878 0.04927895 0.59583008 5.73E-01
16 wt -1.239041102 1.03937377 -1.19210349 2.78E-01
17 (Intercept) 19.7 NaN NaN NaN
18 cyl NA NA NA NA
19 hp NA NA NA NA
20 wt NA NA NA NA
21 (Intercept) 15 NaN NaN NaN
22 cyl NA NA NA NA
23 hp NA NA NA NA
24 wt NA NA NA NA
我想要的是一个看起来像这样的桌子:
carb intercept cyl hp wt
1 46.034633 2.650503624 -0.230007961 -5.231999683
2 39.84451509 -0.846094229 -0.007452737 -4.133340298
3 17.50267062 NA NA -0.3115727
4 45.33390978 -4.195214198 0.029361878 -1.239041102
6 19.7 NA NA NA
8 15 NA NA NA
我不知道如何引入分组变量的值。如果我可以将其添加到我已有的内容中,我就知道如何将数据转置为我需要的形式。
将
summarise
与 unnest_wider
一起使用:
df %>%
summarise(a = list(coef(lm(mpg~cyl + hp + wt, cur_data()))), .by = carb)%>%
unnest_wider(a, names_sep = "_")
# A tibble: 6 × 5
carb `a_(Intercept)` a_cyl a_hp a_wt
<dbl> <dbl> <dbl> <dbl> <dbl>
1 4 45.3 -4.20 0.0294 -1.24
2 1 46.0 2.65 -0.230 -5.23
3 2 39.8 -0.846 -0.00745 -4.13
4 3 17.5 NA NA -0.312
5 6 19.7 NA NA NA
6 8 15 NA NA NA
看来您只对系数感兴趣。在 R 基础上你可以这样做:
mtcars$carb <- factor(mtcars$carb)
a <- coef(lm(mpg~ carb/(cyl+hp+wt) + 0, mtcars))
a
carb1 carb2 carb3 carb4 carb6 carb8 carb1:cyl
46.034632999 39.844515085 17.502670623 45.333909777 19.700000000 15.000000000 2.650503624
carb2:cyl carb3:cyl carb4:cyl carb6:cyl carb8:cyl carb1:hp carb2:hp
-0.846094229 NA -4.195214198 NA NA -0.230007961 -0.007452737
carb3:hp carb4:hp carb6:hp carb8:hp carb1:wt carb2:wt carb3:wt
NA 0.029361878 NA NA -5.231999683 -4.133340298 -0.311572700
carb4:wt carb6:wt carb8:wt
-1.239041102 NA NA
上面的系数与您拥有的系数相匹配。您可以轻松转换为矩阵:
matrix(a, nlevels(mtcars$carb))
[,1] [,2] [,3] [,4]
[1,] 46.03463 2.6505036 -0.230007961 -5.2319997
[2,] 39.84452 -0.8460942 -0.007452737 -4.1333403
[3,] 17.50267 NA NA -0.3115727
[4,] 45.33391 -4.1952142 0.029361878 -1.2390411
[5,] 19.70000 NA NA NA
[6,] 15.00000 NA NA NA
看起来像你的,但没有名字。在命名部分,如果你知道正则表达式,那就很容易了,你可以直接从向量 a 中获取名称
dm <- list(unique(sub(":.*", "", names(a))),
replace(unique(gsub(".*:|.*\\d+", "", names(a))), 1, 'intercepy'))
matrix(a, nlevels(mtcars$carb), dimnames = dm)
intercepy cyl hp wt
carb1 46.03463 2.6505036 -0.230007961 -5.2319997
carb2 39.84452 -0.8460942 -0.007452737 -4.1333403
carb3 17.50267 NA NA -0.3115727
carb4 45.33391 -4.1952142 0.029361878 -1.2390411
carb6 19.70000 NA NA NA
carb8 15.00000 NA NA NA
假设你不知道正则表达式,那么使用
split
:
b <- mtcars$carb
d <- split(a, rep(levels(b), nlevels(b), length=length(a)))
array2DF(structure(d, dim = nlevels(b)))
Var1 carb1 carb1:cyl carb1:hp carb1:wt
1 1 46.03463 2.6505036 -0.230007961 -5.2319997
2 2 39.84452 -0.8460942 -0.007452737 -4.1333403
3 3 17.50267 NA NA -0.3115727
4 4 45.33391 -4.1952142 0.029361878 -1.2390411
5 6 19.70000 NA NA NA
6 8 15.00000 NA NA NA