(R) 将数字列装箱以计算 group by 后的出现次数

问题描述 投票:0回答:1

如果帖子标题有点令人困惑,我们深表歉意。假设我有以下数据框:

set.seed(123)
test <- data.frame("chr" = rep("chr1",30), "position" = sample(c(1:50), 30, replace = F) , 
         "info" = sample(c("X","Y"), 30, replace = T), 
         "condition"= sample(c("soft","stiff"), 30, replace = T) )

## head(test)
   chr position info condition
1 chr1       31    Y      soft
2 chr1       15    Y      soft
3 chr1       14    X      soft
4 chr1        3    X      soft
5 chr1       42    X     stiff
6 chr1       43    X     stiff

我想对

position
列进行分类。假设尺寸为 10。然后根据条件(软或硬),我想计算
info
列中的出现次数。所以数据看起来像这样(不是上面数据的实际结果)

   chr start end condition count_Y count_X
1 chr1   1    10    soft      2       3
2 chr1   1    10    stiff     0       2
3 chr1   11   20    soft      2       5
4 chr1   11   20    soft      1       2
5 chr1   21   30    soft      2       0
6 chr1   21   30    stiff     0       4

为了使其更容易,最好根据条件创建两个数据框,然后应用分箱和计数,但我陷入了这部分。任何帮助表示赞赏。非常感谢。

r dataframe binning
1个回答
1
投票

使用

cut
进行分箱,
dplyr::count
tidyr::pivot_wider
你可以这样做:

library(dplyr, warn=FALSE)
library(tidyr)

test |>
  mutate(
    bin = cut(position, seq(0, 50, 10), labels = FALSE),
    start = (bin - 1) * 10 + 1,
    end = bin * 10
  ) |>
  count(chr, start, end, condition, info) |>
  tidyr::pivot_wider(
    names_from = info, 
    values_from = n, 
    names_prefix = "count_",
    values_fill = 0
  )
#> # A tibble: 9 × 6
#>   chr   start   end condition count_X count_Y
#>   <chr> <dbl> <dbl> <chr>       <int>   <int>
#> 1 chr1      1    10 soft            4       0
#> 2 chr1      1    10 stiff           2       1
#> 3 chr1     11    20 soft            3       3
#> 4 chr1     21    30 soft            1       1
#> 5 chr1     21    30 stiff           3       1
#> 6 chr1     31    40 soft            0       2
#> 7 chr1     31    40 stiff           2       1
#> 8 chr1     41    50 soft            0       1
#> 9 chr1     41    50 stiff           4       1
© www.soinside.com 2019 - 2024. All rights reserved.