这是我正在处理的数据集中的(缩短的)样本。样本代表来自具有2个疗程(session_number
)的实验的数据,在每个疗程中参与者完成了5个手握法练习(trial_number
)(因此,总共10个; 2 * 5 = 10个)。 5项试验中的每一项都有3次手握强度观察(percent_of_maximum
)。我希望得到10个试验中每个试验的3个观察值的平均值(下面,我称之为mean_by_trial
)。
最后,这就是我所坚持的,我想输出一个20行的数据集(每个独特的试验一行,每个参与者有2个参与者和10个试验; 2 * 10 = 20),和保留所有其他变量。所有其他变量(在示例中有:placebo
,support
,personality
和perceived_difficulty
)对于每个独特的Participant
,trial_number
或session_number
都是相同的(参见下面的样本数据集)。
我使用ddply
尝试了这个,这几乎是我想要的,但新数据集不包含数据集中的其他变量(new_dat
只包含trial_number
,session_number
,Participant
和新的mean_by_trial
变量)。我该如何维护其他变量?
#create sample data frame
dat <- data.frame(
Participant = rep(1:2, each = 30),
placebo = c(replicate(15, "placebo"), replicate(15, "control"), replicate(15, "control"), replicate(15, "placebo")),
support = rep(sort(rep(c("support", "control"), 3)), 10),
personality = c(replicate(30, "nice"), replicate(30, "naughty")),
session_number = c(rep(1:2, each = 15), rep(1:2, each = 15)),
trial_number = c(rep(1:5, each = 3), rep(1:5, each = 3), rep(1:5, each = 3), rep(1:5, each = 3)),
percent_of_maximum = runif(60, min = 0, max = 100),
perceived_difficulty = runif(60, min = 50, max = 100)
)
#this is what I have tried so far
library(plyr)
new_dat <- ddply(dat, .(trial_number, session_number, Participant), summarise, mean_by_trial = mean(percent_of_maximum), .drop = FALSE)
我希望new_dat
包含dat
中的所有变量,加上mean_by_trial
变量。谢谢!
这是一个tidyverse
答案。首先,你想要group_by
感兴趣的变量。然后使用mutate
计算新列中的所需平均值。
由于新平均值中的值将在变量中重复,因此请使用distinct
函数来保留uniqe行。换句话说,为Participant
,session_number
和trial_number
的每个组合选择一行。
这是答案(https://stackoverflow.com/a/39092166/9941764)提供:R - dplyr Summarize and Retain Other Columns
new_dat <- dat %>%
group_by(Participant, session_number, trial_number) %>%
mutate(mean = mean(percent_of_maximum)) %>%
distinct(mean, .keep_all = TRUE)
我们可以使用mutate
而不是summarise
在数据集中创建一个列,然后执行slice
library(dplyr)
out <- ddply(dat, .(trial_number, session_number, Participant),
plyr::mutate, mean_by_trial = mean(percent_of_maximum), .drop = FALSE)
out %>%
group_by(trial_number, session_number, Participant) %>%
slice(1)
如果我们使用dplyr
,那么这都可以在链中
newdat <- dat %>%
group_by(trial_number, session_number, Participant) %>%
mutate(mean_by_trial = mean(percent_of_maximum)) %>%
slice(1)
head(newdat)
# A tibble: 6 x 9
# Groups: trial_number, session_number, Participant [6]
Participant placebo support personality session_number trial_number percent_of_maximum perceived_difficulty mean_by_trial
# <int> <fct> <fct> <fct> <int> <int> <dbl> <dbl> <dbl>
#1 1 placebo control nice 1 1 71.5 95.5 73.9
#2 2 control control naughty 1 1 38.9 63.8 67.7
#3 1 control support nice 2 1 97.1 54.2 68.4
#4 2 placebo support naughty 2 1 62.9 86.2 40.4
#5 1 placebo support nice 1 2 49.0 95.8 65.7
#6 2 control support naughty 1 2 80.9 74.6 68.3