我已经看到一些关于使用tidy
,dplyr
和purrr
来使用表中的线性回归来预测一个值的例子。我想预测一个全新的数据帧,而不是只预测一个值。所以我有下一个数据:
library(tidyverse)
y <- rep(seq(0, 240, by = 40), each = 7)
x <- rep(1:7, times = 7)
vol <- c(300, 380, 430, 460, 480, 485, 489,
350, 445, 505, 540, 565, 580, 585,
380, 490, 560, 605, 635, 650, 655,
400, 525, 605, 655, 690, 710, 715,
415, 555, 655, 710, 740, 760, 765,
420, 570, 680, 740, 775, 800, 805,
422, 580, 695, 765, 805, 830, 835)
df <- as.data.frame(cbind(y, x, vol))
我曾经用它来创建这样的模型:
df.1 <- df %>%
group_by(y) %>%
do(mod = lm(vol ~ poly(x, 5), data = .))
df.1
看起来像这样:
# A tibble: 7 x 2
y mod
* <int> <list>
1 0 <S3: lm>
2 40 <S3: lm>
3 80 <S3: lm>
4 120 <S3: lm>
5 160 <S3: lm>
6 200 <S3: lm>
7 240 <S3: lm>
现在我想使用一个新的数据框并使用上面的模型来预测vol
的新值
newx <- data.frame(x = seq(1, 7, 0.001))
更新:答案应该是7个表,其中维度为6001x2,x值为1到7乘0.001,“vol”值表示来自'x'的预测。
要使用列表列,请使用purrr::map
(或lapply
)或变体对其进行迭代。根据需要使用tidyr::unnest
展开列。
library(tidyverse)
df <- data_frame(y = rep(seq(0, 240, by = 40), each = 7),
x = rep(1:7, times = 7),
vol = c(300, 380, 430, 460, 480, 485, 489,
350, 445, 505, 540, 565, 580, 585,
380, 490, 560, 605, 635, 650, 655,
400, 525, 605, 655, 690, 710, 715,
415, 555, 655, 710, 740, 760, 765,
420, 570, 680, 740, 775, 800, 805,
422, 580, 695, 765, 805, 830, 835))
df.1 <- df %>%
nest(-y) %>%
mutate(mods = map(data, ~lm(vol ~ poly(x, 5), data = .x)),
preds = map(mods, predict, newdata = data.frame(x = seq(1, 7, 0.001))))
df.1
#> # A tibble: 7 x 4
#> y data mods preds
#> <dbl> <list> <list> <list>
#> 1 0 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 2 40 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 3 80 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 4 120 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 5 160 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 6 200 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 7 240 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
另一种选择是使用augment
的broom
函数:
library(tidyverse)
library(broom)
tibble(y = df.1$y,
predictions = map(df.1$mod, augment, newdata = newx)) %>%
unnest() %>%
select(y, x, vol = .fitted) %>%
spread(y, vol)
# A tibble: 6,001 x 8
# x `0` `40` `80` `120` `160` `200` `240`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 300. 350. 380. 400. 415. 420. 422.
# 2 1.00 300. 350. 380. 400. 415. 420. 422.
# 3 1.00 300. 350. 380. 400. 415. 420. 422.
# 4 1.00 300. 350. 380. 400. 415. 420. 423.
# 5 1.00 300. 350. 381. 401. 416. 421. 423.
# 6 1.00 300. 351. 381. 401. 416. 421. 423.
# 7 1.01 301. 351. 381. 401. 416. 421. 423.
# 8 1.01 301. 351. 381. 401. 416. 421. 423.
# 9 1.01 301. 351. 381. 401. 416. 421. 423.
# 10 1.01 301. 351. 381. 401. 416. 421. 424.
# ... with 5,991 more rows