这是示例数据框:
data.frame(sample = c('A','A','A','A','A','B','B','B','B','B'),
measure = c(20,30,40,60,60,20,60,50,40,10),
time = c(1,2,3,4,5,3,4,5,6,7),
start = c(1,1,1,1,1,3,3,3,3,3),
end = c(4,4,4,4,4,6,6,6,6,6))
sample measure time start end
1 A 20 1 1 4
2 A 30 2 1 4
3 A 40 3 1 4
4 A 60 4 1 4
5 A 60 5 1 4
6 B 20 3 3 6
7 B 60 4 3 6
8 B 50 5 3 6
9 B 40 6 3 6
10 B 10 7 3 6
我想要做的是创建一个名为“calc”的新列,用于计算 time=end / time=start
我首先猜测 group_by 样本,但我无法弄清楚如何引用开始和结束时间。最终数据应该是这样的
sample measure time start end calc
1 A 20 1 1 4 3
2 A 30 2 1 4 3
3 A 40 3 1 4 3
4 A 60 4 1 4 3
5 A 60 5 1 4 3
6 B 20 3 3 6 2
7 B 60 4 3 6 2
8 B 50 5 3 6 2
9 B 40 6 3 6 2
10 B 10 7 3 6 2
怎么样(
d
是你的数据框):
library(dplyr)
d |>
rowwise() |>
mutate(is_endpoint = any(time == start, time == end)) |>
ungroup() |>
mutate(calc = max(measure[is_endpoint]) /
min(measure[is_endpoint]),
.by = sample
) |>
select(-is_endpoint)
输出:
# A tibble: 10 x 6
sample measure time start end calc
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 20 1 1 4 3
2 A 30 2 1 4 3
3 A 40 3 1 4 3
4 A 60 4 1 4 3
5 A 60 5 1 4 3
6 B 20 3 3 6 2
7 B 60 4 3 6 2
8 B 50 5 3 6 2
9 B 40 6 3 6 2
10 B 10 7 3 6 2