如果帖子标题有点令人困惑,我们深表歉意。假设我有以下数据框:
set.seed(123)
test <- data.frame("chr" = rep("chr1",30), "position" = sample(c(1:50), 30, replace = F) ,
"info" = sample(c("X","Y"), 30, replace = T),
"condition"= sample(c("soft","stiff"), 30, replace = T) )
## head(test)
chr position info condition
1 chr1 31 Y soft
2 chr1 15 Y soft
3 chr1 14 X soft
4 chr1 3 X soft
5 chr1 42 X stiff
6 chr1 43 X stiff
我想对
position
列进行分类。假设尺寸为 10。然后根据条件(软或硬),我想计算 info
列中的出现次数。所以数据看起来像这样(不是上面数据的实际结果)
chr start end condition count_Y count_X
1 chr1 1 10 soft 2 3
2 chr1 1 10 stiff 0 2
3 chr1 11 20 soft 2 5
4 chr1 11 20 soft 1 2
5 chr1 21 30 soft 2 0
6 chr1 21 30 stiff 0 4
为了使其更容易,最好根据条件创建两个数据框,然后应用分箱和计数,但我陷入了这部分。任何帮助表示赞赏。非常感谢。
使用
cut
进行分箱,dplyr::count
和 tidyr::pivot_wider
你可以这样做:
library(dplyr, warn=FALSE)
library(tidyr)
test |>
mutate(
bin = cut(position, seq(0, 50, 10), labels = FALSE),
start = (bin - 1) * 10 + 1,
end = bin * 10
) |>
count(chr, start, end, condition, info) |>
tidyr::pivot_wider(
names_from = info,
values_from = n,
names_prefix = "count_",
values_fill = 0
)
#> # A tibble: 9 × 6
#> chr start end condition count_X count_Y
#> <chr> <dbl> <dbl> <chr> <int> <int>
#> 1 chr1 1 10 soft 4 0
#> 2 chr1 1 10 stiff 2 1
#> 3 chr1 11 20 soft 3 3
#> 4 chr1 21 30 soft 1 1
#> 5 chr1 21 30 stiff 3 1
#> 6 chr1 31 40 soft 0 2
#> 7 chr1 31 40 stiff 2 1
#> 8 chr1 41 50 soft 0 1
#> 9 chr1 41 50 stiff 4 1