使用 purrr map 迭代数据框列表,提取列并创建一个新的数据框

问题描述 投票:0回答:2

我正在练习使用

purrr
包,但仍然不够。我正在尝试使用
purrr
map()
函数遍历数据框列表并选择特定列来创建新的数据框。我尝试了以下代码,但它似乎不起作用:

library(tidyverse)
    
df1 <- data.frame(x = 1:5, col1 = letters[1:5])
df2 <- data.frame(x = 5:10, col2 = letters[5:10])
df3 <- data.frame(x = 10:15, col3 = letters[10:15])

list_dataframes <- list(df1, df2, df3)

# would like to do something like this
new_dataframe <- map_df(list_dataframes, ~select(., c("col1", "col2", "col3")))
#> Error in `map()`:
#> i In index: 1.
#> Caused by error in `select()`:
#> ! Can't subset columns that don't exist.
#> x Column `col2` doesn't exist.

#> Backtrace:
#>      x
#>   1. +-purrr::map_df(list_dataframes, ~select(., c("col1", "col2", "col3")))
#>   2. | \-purrr::map(.x, .f, ...)
#>   3. |   \-purrr:::map_("list", .x, .f, ..., .progress = .progress)
#>   4. |     +-purrr:::with_indexed_errors(...)
#>   5. |     | \-base::withCallingHandlers(...)
#>   6. |     +-purrr:::call_with_cleanup(...)
#>   7. |     \-global .f(.x[[i]], ...)
#>   8. |       +-dplyr::select(., c("col1", "col2", "col3"))
#>   9. |       \-dplyr:::select.data.frame(., c("col1", "col2", "col3"))
#>  10. |         \-tidyselect::eval_select(expr(c(...)), data = .data, error_call = error_call)
#>  11. |           \-tidyselect:::eval_select_impl(...)
#>  12. |             +-tidyselect:::with_subscript_errors(...)
#>  13. |             | \-rlang::try_fetch(...)
#>  14. |             |   \-base::withCallingHandlers(...)
#>  15. |             \-tidyselect:::vars_select_eval(...)
#>  16. |               \-tidyselect:::walk_data_tree(expr, data_mask, context_mask)
#>  17. |                 \-tidyselect:::eval_c(expr, data_mask, context_mask)
#>  18. |                   \-tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
#>  19. |                     \-tidyselect:::walk_data_tree(new, data_mask, context_mask)
#>  20. |                       \-tidyselect:::eval_c(expr, data_mask, context_mask)
#>  21. |                         \-tidyselect:::reduce_sels(node, data_mask, context_mask, init = init)
#>  22. |                           \-tidyselect:::walk_data_tree(new, data_mask, context_mask)
#>  23. |                             \-tidyselect:::as_indices_sel_impl(...)
#>  24. |                               \-tidyselect:::as_indices_impl(...)
#>  25. |                                 \-tidyselect:::chr_as_locations(x, vars, call = call, arg = arg)
#>  26. |                                   \-vctrs::vec_as_location(...)
#>  27. \-vctrs (local) `<fn>`()
#>  28.   \-vctrs:::stop_subscript_oob(...)
#>  29.     \-vctrs:::stop_subscript(...)
#>  30.       \-rlang::abort(...)

创建于 2023-03-01 与 reprex v2.0.2

LE:

map_df(list_dataframes, ~select_if(., is.character))
有效但未正确绑定列。

任何帮助或见解将不胜感激!

r dataframe purrr
2个回答
2
投票

我认为问题是名称“col1”(或 col2 或 col3)在您的所有 data.tables 中都不存在。你可以试试 :

new_dataframe <- map_df(list_dataframes, ~{
return(.x) %>% select(starts_with("col"))
}) %>% bind_cols()

2
投票

您可以像这样使用

map_df
(按行绑定)

map_df(list_dataframes, ~select(., any_of(c("col1", "col2", "col3"))))

   col1 col2 col3
1     a <NA> <NA>
2     b <NA> <NA>
3     c <NA> <NA>
4     d <NA> <NA>
5     e <NA> <NA>
6     f <NA> <NA>
7  <NA>    e <NA>
8  <NA>    f <NA>
9  <NA>    g <NA>
10 <NA>    h <NA>
11 <NA>    i <NA>
12 <NA>    j <NA>
13 <NA> <NA>    j
14 <NA> <NA>    k
15 <NA> <NA>    l
16 <NA> <NA>    m
17 <NA> <NA>    n
18 <NA> <NA>    o

或者如果你的行在列表中的长度完全相同,那么你可以使用

bind_cols()
,(我将 df1 更改为
df1 <- data.frame(x = 1:6, col1 = letters[1:6])

new_dataframe <- map(list_dataframes, ~select(., any_of(c("col1", "col2", "col3"))))
new_dataframe |> bind_cols()

  col1 col2 col3
1    a    e    j
2    b    f    k
3    c    g    l
4    d    h    m
5    e    i    n
6    f    j    o
© www.soinside.com 2019 - 2024. All rights reserved.