rowwise()产生均值()和sum()的错误

问题描述 投票:1回答:2

我正在尝试获取行中某些列的mean()和sum()。此代码将产生数据集:

library(tidyverse)

test_data <- tibble(part_id = 1:5,
                      a_1 = c("a", "b", "c", "d", "a"),
                      a_2 = c("b", NA, "b", "a", "d"),
                      a_3 = c("b", "b", "d", "d", "a"))


test_data <- test_data %>%
  mutate_at(vars(a_1, a_2), .funs = list(scored = ~case_when(
    . == "a" | . == "b" ~ 1,
    . == "c" ~ 0,
    . == "d" ~ -100)))

如果我尝试使用rowSums()或rowMeans(),则会得到正确的答案:

library(tidyverse)

test_data <- test_data %>%
  mutate(a_total = rowSums(dplyr::select(., contains("scored")), na.rm = TRUE),
         a_mean = rowMeans(dplyr::select(., contains("scored")), na.rm = TRUE))

但是,如果尝试使用rowwise()继之以sum()或mean(),则将无法正常工作:

library(tidyverse)

test_data <- test_data %>%
  rowwise() %>%
  mutate(a_total = base::sum(dplyr::select(., contains("scored")), na.rm = TRUE),
         a_mean = base::mean(dplyr::select(., contains("scored")), na.rm = TRUE)) %>%
  ungroup()

对于sum(),它给出了总和,有效地忽略了rowwise(),而对于mean(),所有答案都是NA,我得到每一行的警告:

Warning messages:
1: In mean.default(dplyr::select(., contains("scored")), na.rm = TRUE) :
  argument is not numeric or logical: returning NA

我也尝试通过包括c()函数来进行此修改,就像列出每个列一样。这导致以下错误:

library(tidyverse)

test_data <- test_data %>%
  rowwise() %>%
  mutate(a_total = base::sum(c(dplyr::select(., contains("scored"))), na.rm = TRUE),
         a_mean = base::mean(c(dplyr::select(., contains("scored"))), na.rm = TRUE)) %>%
  ungroup()

Error in base::sum(c(dplyr::select(., contains("scored"))), na.rm = TRUE) : 
  invalid 'type' (list) of argument

我如何使用rowwise()进行这项工作?为什么这种行为与典型行为和rowSums()或rowMeans()如此不同?

我感谢任何见识!

r dplyr sum mean rowwise
2个回答
1
投票

问题是rowwise正在按行分组,summean等在vector上起作用。它本质上适用于单行data.frame。通过用unlist换行,可以将其从data.frame转换为vector

library(dplyr)
test_data <- test_data %>%
                  rowwise() %>%
                  mutate(a_total = base::sum(unlist(dplyr::select(., 
                               contains("scored")), recursive = FALSE), na.rm = TRUE),
                         a_mean = base::mean(unlist(dplyr::select(., 
                               contains("scored")), recursive = FALSE), na.rm = TRUE)) %>%
                   ungroup()

或使用pmap

library(purrr)
test_data  %>%
   mutate(a_total = pmap_dbl(select(., contains("scored")),
                    ~ sum(c(...), na.rm = TRUE)),
          a_mean =  pmap_dbl(select(., contains("scored")),
                    ~ mean(c(...), na.rm = TRUE)))

0
投票

如果您要坚持使用rowwise()来捕获要求和并求平均值的变量,这是另一种方法:

{rlang}

library(dplyr) test_data <- tibble(part_id = 1:5, a_1 = c("a", "b", "c", "d", "a"), a_2 = c("b", NA, "b", "a", "d"), a_3 = c("b", "b", "d", "d", "a")) test_data <- test_data %>% mutate_at(vars(a_1, a_2), .funs = list(scored = ~case_when( . == "a" | . == "b" ~ 1, . == "c" ~ 0, . == "d" ~ -100))) # Get the names of the variables you want vars <- test_data %>% select(contains("scored")) %>% names() # Use `rlang` so that `dplyr` will recognize the variable names test_data %>% rowwise() %>% mutate(a_sum = sum(c(!!!rlang::syms(vars)), na.rm = TRUE), a_mean = mean(c(!!!rlang::syms(vars)), na.rm = TRUE)) %>% ungroup() #> # A tibble: 5 x 8 #> part_id a_1 a_2 a_3 a_1_scored a_2_scored a_sum a_mean #> <int> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> #> 1 1 a b b 1 1 2 1 #> 2 2 b <NA> b 1 NA 1 1 #> 3 3 c b d 0 1 1 0.5 #> 4 4 d a d -100 1 -99 -49.5 #> 5 5 a d a 1 -100 -99 -49.5 (v0.3.0)在2020-04-05创建

© www.soinside.com 2019 - 2024. All rights reserved.