挑战如下:我们的任务是使用 dplyr 计算数据框中列子集的按行最小值,但列名称各不相同,并且必须与 mycols 中存储的一组指定值相匹配。让我们用一些无效的代码片段来解决这个问题:
R
# Given data
df=data.frame(
x1=c(2,0,0,NA,0,1,1,NA,0,1),
x2=c(3,2,NA,5,3,2,NA,NA,4,5),
x3=c(0,1,0,1,3,0,NA,NA,0,1),
x4=c(1,0,NA,3,0,0,NA,0,0,1),
x5=c(1,1,NA,1,3,4,NA,3,3,1)
)
# Attempt to calculate row-wise minimum using specified columns
mycols <- c("x2","x5")
# Invalid attempt using matches
df <- df %>% rowwise() %>%
mutate(min = min(select(matches(mycols))))
# Error: is.string(match) is not TRUE
# Another invalid attempt using one_of
df <- df %>%
rowwise() %>%
mutate(min = min(select(one_of(mycols))))
# Error: no applicable method for 'select' applied to an object of class "c('integer', 'numeric')"
# Warning message: In one_of(c("x2", "x5")) : Unknown variables: `x2`, `x5`
# Invalid use of select_
df <- df %>%
rowwise() %>%
mutate(min = min(select_(mycols)))
# Error: no applicable method for 'select_' applied to an object of class "character"
# Similarly invalid attempt with matches and select_
df <- df %>%
rowwise() %>%
mutate(min = min(select_(matches(mycols))))
# Error: is.string(match) is not TRUE
我们似乎在动态选择列的过程中偶然发现了一个挑战。进一步的探索,也许查阅文档可能会揭示前进的道路。
这是另一个有点技术性的解决方案,借助为函数式编程设计的 tidyverse 中的
purrr
包。
来自 matches
的Fist、
dplyr
助手将正则表达式字符串作为参数而不是向量。这是查找与所有列匹配的正则表达式的好方法。
(在下面的代码中,您可以使用 dplyr
选择您想要的帮助器)
然后,当您了解函数式编程的底层方案时,
purrr
函数可以与 dplyr
很好地配合。
解决您的问题:
df=data.frame(
x1=c(2,0,0,NA,0,1,1,NA,0,1),
x2=c(3,2,NA,5,3,2,NA,NA,4,5),
x3=c(0,1,0,1,3,0,NA,NA,0,1),
x4=c(1,0,NA,3,0,0,NA,0,0,1),
x5=c(1,1,NA,1,3,4,NA,3,3,1))
# regex to get only x2 and x5 column
mycols <- "x[25]"
library(dplyr)
df %>%
mutate(min_x2_x5 =
# select columns that you want in df
select(., matches(mycols)) %>%
# use pmap on this subset to get a vector of min from each row.
# dataframe is a list so pmap works on each element of the list that is to say each row
purrr::pmap_dbl(min)
)
#> x1 x2 x3 x4 x5 min_x2_x5
#> 1 2 3 0 1 1 1
#> 2 0 2 1 0 1 1
#> 3 0 NA 0 NA NA NA
#> 4 NA 5 1 3 1 1
#> 5 0 3 3 0 3 3
#> 6 1 2 0 0 4 2
#> 7 1 NA NA NA NA NA
#> 8 NA NA NA 0 3 NA
#> 9 0 4 0 0 3 3
#> 10 1 5 1 1 1 1
我不会在这里进一步解释
purrr
,但它在你的情况下工作得很好
这有点棘手。在 SE 评估的情况下,您需要将操作作为字符串传递。
mycols <- '(x2,x5)'
f <- paste0('min',mycols)
df %>% rowwise() %>% mutate_(min = f)
df
# A tibble: 10 × 6
# x1 x2 x3 x4 x5 min
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2 3 0 1 1 1
#2 0 2 1 0 1 1
#3 0 NA 0 NA NA NA
#4 NA 5 1 3 1 1
#5 0 3 3 0 3 3
#6 1 2 0 0 4 2
#7 1 NA NA NA NA NA
#8 NA NA NA 0 3 NA
#9 0 4 0 0 3 3
#10 1 5 1 1 1 1