我正在尝试获取行中某些列的mean()和sum()。此代码将产生数据集:
library(tidyverse)
test_data <- tibble(part_id = 1:5,
a_1 = c("a", "b", "c", "d", "a"),
a_2 = c("b", NA, "b", "a", "d"),
a_3 = c("b", "b", "d", "d", "a"))
test_data <- test_data %>%
mutate_at(vars(a_1, a_2), .funs = list(scored = ~case_when(
. == "a" | . == "b" ~ 1,
. == "c" ~ 0,
. == "d" ~ -100)))
如果我尝试使用rowSums()或rowMeans(),则会得到正确的答案:
library(tidyverse)
test_data <- test_data %>%
mutate(a_total = rowSums(dplyr::select(., contains("scored")), na.rm = TRUE),
a_mean = rowMeans(dplyr::select(., contains("scored")), na.rm = TRUE))
但是,如果尝试使用rowwise()继之以sum()或mean(),则将无法正常工作:
library(tidyverse)
test_data <- test_data %>%
rowwise() %>%
mutate(a_total = base::sum(dplyr::select(., contains("scored")), na.rm = TRUE),
a_mean = base::mean(dplyr::select(., contains("scored")), na.rm = TRUE)) %>%
ungroup()
对于sum(),它给出了总和,有效地忽略了rowwise(),而对于mean(),所有答案都是NA,我得到每一行的警告:
Warning messages:
1: In mean.default(dplyr::select(., contains("scored")), na.rm = TRUE) :
argument is not numeric or logical: returning NA
我也尝试通过包括c()函数来进行此修改,就像列出每个列一样。这导致以下错误:
library(tidyverse)
test_data <- test_data %>%
rowwise() %>%
mutate(a_total = base::sum(c(dplyr::select(., contains("scored"))), na.rm = TRUE),
a_mean = base::mean(c(dplyr::select(., contains("scored"))), na.rm = TRUE)) %>%
ungroup()
Error in base::sum(c(dplyr::select(., contains("scored"))), na.rm = TRUE) :
invalid 'type' (list) of argument
我如何使用rowwise()进行这项工作?为什么这种行为与典型行为和rowSums()或rowMeans()如此不同?
我感谢任何见识!
问题是rowwise
正在按行分组,sum
,mean
等在vector
上起作用。它本质上适用于单行data.frame。通过用unlist
换行,可以将其从data.frame
转换为vector
library(dplyr)
test_data <- test_data %>%
rowwise() %>%
mutate(a_total = base::sum(unlist(dplyr::select(.,
contains("scored")), recursive = FALSE), na.rm = TRUE),
a_mean = base::mean(unlist(dplyr::select(.,
contains("scored")), recursive = FALSE), na.rm = TRUE)) %>%
ungroup()
或使用pmap
library(purrr)
test_data %>%
mutate(a_total = pmap_dbl(select(., contains("scored")),
~ sum(c(...), na.rm = TRUE)),
a_mean = pmap_dbl(select(., contains("scored")),
~ mean(c(...), na.rm = TRUE)))
如果您要坚持使用rowwise()
来捕获要求和并求平均值的变量,这是另一种方法:
{rlang}
由library(dplyr)
test_data <- tibble(part_id = 1:5,
a_1 = c("a", "b", "c", "d", "a"),
a_2 = c("b", NA, "b", "a", "d"),
a_3 = c("b", "b", "d", "d", "a"))
test_data <- test_data %>%
mutate_at(vars(a_1, a_2), .funs = list(scored = ~case_when(
. == "a" | . == "b" ~ 1,
. == "c" ~ 0,
. == "d" ~ -100)))
# Get the names of the variables you want
vars <- test_data %>% select(contains("scored")) %>% names()
# Use `rlang` so that `dplyr` will recognize the variable names
test_data %>%
rowwise() %>%
mutate(a_sum = sum(c(!!!rlang::syms(vars)), na.rm = TRUE),
a_mean = mean(c(!!!rlang::syms(vars)), na.rm = TRUE)) %>%
ungroup()
#> # A tibble: 5 x 8
#> part_id a_1 a_2 a_3 a_1_scored a_2_scored a_sum a_mean
#> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 a b b 1 1 2 1
#> 2 2 b <NA> b 1 NA 1 1
#> 3 3 c b d 0 1 1 0.5
#> 4 4 d a d -100 1 -99 -49.5
#> 5 5 a d a 1 -100 -99 -49.5
(v0.3.0)在2020-04-05创建