计算给定因素中每个级别有多少个值?

问题描述 投票:0回答:1

每年,我想创建两个新列

temp_count
rh_count
分别计算每个
temp_catog
humidity_catog
列中出现的次数。这 How to count how many values per level in a given factor? answers if you group by one variable, but I would like to use
group_by(year, humidity_catog, temp_catog)
.这是我的数据截图

我可以使用以下代码创建一个列

humidity_count
来计算每个类别
humidity_catog
列中出现的次数。

df <- group_by(year, humidity_catog) %>%
  summarize(humidity_count = n())

这是输出

但是我想在同一个数据框中创建另一列

temp_count
来计算每个类别
temp_count
列的数量。我怎样才能做到这一点?这是我通过 dput 函数创建的数据的可重现示例。

df <- structure(
  list(
    year = structure(
      c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
        1L, 1L, 1L),
      .Label = c(
        "2006",
        "2007",
        "2012",
        "2013",
        "2014",
        "2014_c",
        "2015_a",
        "2015_b",
        "2016",
        "2017",
        "2020"
      ),
      class = "factor"
    ),
    min_rh = c(47.9, 49, 44.7, 40.2, 50, 52.3, 51.5, 82.8, 73.8,
               47.1),
    min_temp = c(12.4, 14.3, 15.1, 16.1, 12.7, 16.1, 14.4,
                 15.1, 11.8, 9.5),
    temp_catog = structure(
      c(2L, 2L, 3L, 3L,
        2L, 3L, 2L, 3L, 2L, 2L),
      .Label = c("T1(<=8)", "T2(>8, <=15)",
                 "T3(>15)"),
      class = "factor"
    ),
    humidity_catog = structure(
      c(1L,
        1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L),
      .Label = c("RH1(<=65)",
                 "RH2(>65)"),
      class = "factor"
    )
  ),
  class = c("grouped_df",
            "tbl_df", "tbl", "data.frame"),
  row.names = c(NA,-10L),
  groups = structure(
    list(
      year = structure(
        1L,
        .Label = c(
          "2006",
          "2007",
          "2012",
          "2013",
          "2014",
          "2014_c",
          "2015_a",
          "2015_b",
          "2016",
          "2017",
          "2020"
        ),
        class = "factor"
      ),
      .rows = structure(
        list(1:10),
        ptype = integer(0),
        class = c("vctrs_list_of",
                  "vctrs_vctr", "list")
      )
    ),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA,-1L),
    .drop = TRUE
  )
)

注意:我不想要独特的事件。我只想统计每个类别记录了多少次

r dataframe dplyr count data-manipulation
1个回答
1
投票

不太确定OP如何合并两个汇总结果,但我们可以调用

mutate
而不是
summarise
,顺序地将分组变量提供给
.by
参数。

obs:玩具数据框按年份分组,我事先取消分组

library(dplyr) #requires dplyr 1.1.0 for the .by solution

df %>%
    ungroup() %>%
    mutate(rh_count = n(), .by = c(year, humidity_catog)) %>%
    mutate(temp_count = n(), .by = c(year, temp_catog))

# A tibble: 10 × 7
   year  min_rh min_temp temp_catog   humidity_catog rh_count temp_count
   <fct>  <dbl>    <dbl> <fct>        <fct>             <int>      <int>
 1 2006    47.9     12.4 T2(>8, <=15) RH1(<=65)             8          6
 2 2006    49       14.3 T2(>8, <=15) RH1(<=65)             8          6
 3 2006    44.7     15.1 T3(>15)      RH1(<=65)             8          4
 4 2006    40.2     16.1 T3(>15)      RH1(<=65)             8          4
 5 2006    50       12.7 T2(>8, <=15) RH1(<=65)             8          6
 6 2006    52.3     16.1 T3(>15)      RH1(<=65)             8          4
 7 2006    51.5     14.4 T2(>8, <=15) RH1(<=65)             8          6
 8 2006    82.8     15.1 T3(>15)      RH2(>65)              2          4
 9 2006    73.8     11.8 T2(>8, <=15) RH2(>65)              2          6
10 2006    47.1      9.5 T2(>8, <=15) RH1(<=65)             8          6
© www.soinside.com 2019 - 2024. All rights reserved.