我有一个自定义函数:
可重现的示例:
# I have 4 dataframes in a listy with a few observations that repeat themselves
df_1 <- data.frame(col1 = c(1, 2, 3, 4), col2 = c('apple', 'pineapple', 'orange', 'grape'))
df_2 <- data.frame(col1 = c(2, 3, 4, 5, 6, 7), col2 = c('watermelon', 'orange', 'halibut', 'apple', 'iron', 'grape'))
df_3 <- data.frame(col1 = c(2, 3, 4, 5, 6, 7, 9, 0), col2 = c('rock', 'pineapple', 'apple', 'tire', 'bomb', 'star', 'coconut', 'grape'))
df_4 <- data.frame(col1 = c(1, 4, 9), col2 = c('grape', 'apple', 'rock'))
# All inside a another list
df_list <- list(df_1, df_2, df_3, df_4)
# now we use a function where
toy_function <- function(df_list, var1) {
map(df_list, ~.x %>% filter (col2 == var1) %>% mutate(result = col1 * 2)) %>%
bind_rows() %>%
select(result)
}
# Solution from toy_function()
toy_function(df_list = df_list, var1 = 'apple')
现在,我想做的是将字符串向量传递给 toy_function(),如下所示:
# List of strings to pass to toy_function()
list_of_fruits <- c('apple', 'grape')
# This is where it all goes wrong
map2(.x = df_list, .y = list_of_fruits, .f = toy_function)
# Error
Error in `map2()`:
! Can't recycle `.x` (size 4) to match `.y` (size 2).
Run `rlang::last_trace()` to see where the error occurred.
我想从函数中得到的结果是:
map2(.x = df_list, .y = list_of_fruits, .f = toy_function)
# Expected results
[[1]]
result
1 2
2 10
3 8
4 8
[[2]]
result
1 8
2 14
3 0
4 2
编辑
正如评论中指出的,应该修改
toy_function()
以捕获所有变量:
toy_function <- function(df_list, var1) {
map(df_list, ~ {
filtered_df <- .x %>% filter(col2 %in% var1)
filtered_df %>% mutate(result = col1 * 2) %>% select(result)
}) %>%
bind_rows()
}
但还是出现这个错误
> map2(.x = df_list, .y = list_of_fruits, .f = toy_function)
Error in `map2()`:
! Can't recycle `.x` (size 4) to match `.y` (size 2).
Run `rlang::last_trace()` to see where the error occurred.
````
map2
期望参数具有相同的长度,并且它将“并行”迭代它们,它的工作原理如下:
## this map2 call
map2(.x = df_list, .y = list_of_fruits, .f = toy_function)
## is equivalent to this:
list(
toy_function(df_list[[1]], list_of_fruits[[1]]),
toy_function(df_list[[2]], list_of_fruits[[2]]),
toy_function(df_list[[3]], list_of_fruits[[3]]),
...
)
注意
df_list
和 list_of_fruits
是如何同时迭代的。
你不希望这样。您已经编写了
toy_function
,因此它已经期望 list
作为它的第一个参数,并且它在内部使用 map
。您不需要另一个包装器来迭代 df_list
。您只需要迭代 1 个对象,即水果列表。
map(list_of_fruits, \(fruit) toy_function(df_list, fruit))
# [[1]]
# result
# 1 2
# 2 10
# 3 8
# 4 8
#
# [[2]]
# result
# 1 8
# 2 14
# 3 0
# 4 2