我有一个具有多个比例尺的数据框,我想计算每个参与者的均值和和以及每个比例尺的均值和总和。我无法弄清楚如何使用pmap_dbl来获得我的结果。我尝试编写一个函数,但是失败了。
这里是数据示例:
library(tidyverse)
df <- tibble(tep_1 = sample(c(0,1), 5, replace = TRUE),
tep_2 = sample(c(0,1), 5, replace = TRUE),
adarta_1 = sample(c(0,1), 5, replace = TRUE),
adarta_2 = sample(c(0,1), 5, replace = TRUE),
adarta_3 = sample(c(0,1), 5, replace = TRUE),
adarta_4 = sample(c(0,1), 5, replace = TRUE),
adarta_5 = sample(c(0,1), 5, replace = TRUE),
adarta_6 = sample(c(0,1), 5, replace = TRUE))
这是我的功能,该功能不起作用。注意:此函数仅尝试获取行总和,但我还需要行均值,均值和标准差:
column_prefix <- c("tep", "adarta")
my_fun <- function(x, y) {
x %>%
select(starts_with(y)) %>%
rowSums(x, na.rm = TRUE)
}
map2_dbl(.x = df, .y = column_prefix, .f = my_fun)
Error: Mapped vectors must have consistent lengths:
* `.x` has length 8
* `.y` has length 2
而且我想做到这一点,所以我可以使用该功能获得此输出:
library(tidyverse)
df <- df %>%
mutate(tep_grand_mean = mean(c(tep_1, tep_2)),
tep_sd = sd(tep_grand_mean),
adarta_grand_mean = mean(c(adarta_1, adarta_1, adarta_2, adarta_3, adarta_4, adarta_5, adarta_6)),
adarta_sd = sd(adarta_grand_mean),
tep_sum = pmap_dbl(select(., starts_with("tep")), sum),
tep_mean = rowMeans(select(., contains("tep")), na.rm = TRUE),
adarta_sum = pmap_dbl(select(., starts_with("adarta")), sum),
adarta_mean = rowMeans(select(., contains("adarta")), na.rm = TRUE))
~~~~~
在这里,对功能进行一些更改后,我们可能只需要map
map(column_prefix, my_fun, x = df)
#[[1]]
#[1] 0 0 2 2 1
#[[2]]
#[1] 4 2 0 1 4
my_fun <- function(x, y) {
x %>%
select(starts_with(y)) %>%
rowSums(na.rm = TRUE)
}
[map2
用于两个对象的长度相同或一个对象具有单个元素时,请用list
包裹并回收
如果每个相似的前缀名称都需要mean
,一种选择是split.default
library(stringr)
df %>%
split.default(str_remove(names(.), "_\\d+$")) %>%
map_df(rowMeans)