到目前为止,我已经有了这个 df:(不是列
result
):
df <- data.frame(number = c(1,1,1,1,2,2,2,2,3,3,3,3),
value1 = c(5,7,6,9,3,5,6,3,4,5,5,6),
group = c("control", "Treated1", "Treated2", "Treated3","control", "Treated1", "Treated2", "Treated3","control", "Treated1", "Treated2", "Treated3"),
result = c(1,1.4,1.2,1.8,1.0,1.67,2,1,1,1.25,1,1.2))
number value1 group result
1 1 5 control 1.00
2 1 7 Treated1 1.40
3 1 6 Treated2 1.20
4 1 9 Treated3 1.80
5 2 3 control 1.00
6 2 5 Treated1 1.67
7 2 6 Treated2 2.00
8 2 3 Treated3 1.00
9 3 4 control 1.00
10 3 5 Treated1 1.25
11 3 5 Treated2 1.00
12 3 6 Treated3 1.20
我想按数字和组对数据进行分组,然后将
group
的每个子组与同一 control
组的 number
进行划分,但我正在努力实现这一点。
例如
Line1: 5/5 = 1.0
Line2: 7/5 = 1.40
Line3: 6/5 = 1.20
Line4: 9/5 = 1.80
Line5: 3/3 = 1.0
我尝试做类似的事情(显然不起作用):
library(dplyr)
df <- df %>%
group_by(number) %>%
mutate(result = value1[group == contains("Treated")] / value1[group == control)
你有什么想法吗?
您可以索引具有
value1
的 group == "control"
,并将所有其他 value1
除以该值。
library(dplyr)
df %>% group_by(number) %>% mutate(result = value1/value1[group == "control"])
或者您可以
arrange
group
列,这样“控制”将始终是 first
值。
df %>% group_by(number) %>%
arrange(number, group) %>%
mutate(result = value1/first(value1))
# A tibble: 12 × 4
# Groups: number [3]
number value1 group result
<dbl> <dbl> <chr> <dbl>
1 1 5 control 1
2 1 7 Treated1 1.4
3 1 6 Treated2 1.2
4 1 9 Treated3 1.8
5 2 3 control 1
6 2 5 Treated1 1.67
7 2 6 Treated2 2
8 2 3 Treated3 1
9 3 4 control 1
10 3 5 Treated1 1.25
11 3 5 Treated2 1.25
12 3 6 Treated3 1.5
如果您的期望是每组应该始终只有一个
"control"
,请考虑按 [[which(group == "control")]]
而不是 [group == "control"]
进行索引。这比 @benson23 的解决方案不太简洁,而且可能会慢一些。但如果 "control"
在一组中出现多次,[[
会通过抛出错误来提醒您。
例如,假设您忘记按
number
对数据进行分组。 [[
适当地抛出错误:
library(dplyr)
df %>% mutate(result = value1/value1[[which(group == "control")]])
# Error in `mutate()`:
# ℹ In argument: `result = value1/value1[[which(group == "control")]]`.
# Caused by error in `value1[[which(group == "control")]]`:
# ! attempt to select more than one element in vectorIndex
而
[
默默地返回错误的输出:
df %>% mutate(result = value1/value1[group == "control"])
# number value1 group result
# 1 1 5 control 1.000000
# 2 1 7 Treated1 2.333333
# 3 1 6 Treated2 1.500000
# 4 1 9 Treated3 1.800000
# 5 2 3 control 1.000000
# 6 2 5 Treated1 1.250000
# 7 2 6 Treated2 1.200000
# 8 2 3 Treated3 1.000000
# 9 3 4 control 1.000000
# 10 3 5 Treated1 1.000000
# 11 3 5 Treated2 1.666667
# 12 3 6 Treated3 1.500000