按参考组划分不同组

问题描述 投票:0回答:2

到目前为止,我已经有了这个 df:(不是列

result
):

df <- data.frame(number = c(1,1,1,1,2,2,2,2,3,3,3,3),
                 value1 = c(5,7,6,9,3,5,6,3,4,5,5,6),
                 group = c("control", "Treated1", "Treated2", "Treated3","control", "Treated1", "Treated2", "Treated3","control", "Treated1", "Treated2", "Treated3"),
                 result = c(1,1.4,1.2,1.8,1.0,1.67,2,1,1,1.25,1,1.2))

   number value1    group result
1       1      5  control   1.00
2       1      7 Treated1   1.40
3       1      6 Treated2   1.20
4       1      9 Treated3   1.80
5       2      3  control   1.00
6       2      5 Treated1   1.67
7       2      6 Treated2   2.00
8       2      3 Treated3   1.00
9       3      4  control   1.00
10      3      5 Treated1   1.25
11      3      5 Treated2   1.00
12      3      6 Treated3   1.20

我想按数字和组对数据进行分组,然后将

group
的每个子组与同一
control
组的
number
进行划分,但我正在努力实现这一点。 例如

Line1: 5/5 = 1.0
Line2: 7/5 = 1.40
Line3: 6/5 = 1.20
Line4: 9/5 = 1.80
Line5: 3/3 = 1.0

我尝试做类似的事情(显然不起作用):

library(dplyr)
df <- df %>%
   group_by(number) %>%
   mutate(result = value1[group == contains("Treated")] / value1[group == control)

你有什么想法吗?

r dataframe dplyr group-by
2个回答
1
投票

您可以索引具有

value1
group == "control"
,并将所有其他
value1
除以该值。

library(dplyr)

df %>% group_by(number) %>% mutate(result = value1/value1[group == "control"])

或者您可以

arrange
group
列,这样“控制”将始终是
first
值。

df %>% group_by(number) %>% 
  arrange(number, group) %>% 
  mutate(result = value1/first(value1))

输出

# A tibble: 12 × 4
# Groups:   number [3]
   number value1 group    result
    <dbl>  <dbl> <chr>     <dbl>
 1      1      5 control    1   
 2      1      7 Treated1   1.4 
 3      1      6 Treated2   1.2 
 4      1      9 Treated3   1.8 
 5      2      3 control    1   
 6      2      5 Treated1   1.67
 7      2      6 Treated2   2   
 8      2      3 Treated3   1   
 9      3      4 control    1   
10      3      5 Treated1   1.25
11      3      5 Treated2   1.25
12      3      6 Treated3   1.5 

0
投票

如果您的期望是每组应该始终只有一个

"control"
,请考虑按
[[which(group == "control")]]
而不是
[group == "control"]
进行索引。这比 @benson23 的解决方案不太简洁,而且可能会慢一些。但如果
"control"
在一组中出现多次,
[[
会通过抛出错误来提醒您。

例如,假设您忘记按

number
对数据进行分组。
[[
适当地抛出错误:

library(dplyr)

df %>% mutate(result = value1/value1[[which(group == "control")]])
# Error in `mutate()`:
#   ℹ In argument: `result = value1/value1[[which(group == "control")]]`.
# Caused by error in `value1[[which(group == "control")]]`:
#   ! attempt to select more than one element in vectorIndex

[
默默地返回错误的输出:

df %>% mutate(result = value1/value1[group == "control"])
#    number value1    group   result
# 1       1      5  control 1.000000
# 2       1      7 Treated1 2.333333
# 3       1      6 Treated2 1.500000
# 4       1      9 Treated3 1.800000
# 5       2      3  control 1.000000
# 6       2      5 Treated1 1.250000
# 7       2      6 Treated2 1.200000
# 8       2      3 Treated3 1.000000
# 9       3      4  control 1.000000
# 10      3      5 Treated1 1.000000
# 11      3      5 Treated2 1.666667
# 12      3      6 Treated3 1.500000
© www.soinside.com 2019 - 2024. All rights reserved.