计算给定因素中每个级别有多少个值？

Question

每年，我想创建两个新列

temp_count

和

rh_count

分别计算每个

temp_catog

和

humidity_catog

列中出现的次数。这 How to count how many values per level in a given factor? answers if you group by one variable, but I would like to use

group_by(year, humidity_catog, temp_catog)

.这是我的数据截图

我可以使用以下代码创建一个列

humidity_count

来计算每个类别

humidity_catog

列中出现的次数。

df <- group_by(year, humidity_catog) %>%
  summarize(humidity_count = n())

这是输出

但是我想在同一个数据框中创建另一列

temp_count

来计算每个类别

temp_count

列的数量。我怎样才能做到这一点？这是我通过 dput 函数创建的数据的可重现示例。

df <- structure(
  list(
    year = structure(
      c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
        1L, 1L, 1L),
      .Label = c(
        "2006",
        "2007",
        "2012",
        "2013",
        "2014",
        "2014_c",
        "2015_a",
        "2015_b",
        "2016",
        "2017",
        "2020"
      ),
      class = "factor"
    ),
    min_rh = c(47.9, 49, 44.7, 40.2, 50, 52.3, 51.5, 82.8, 73.8,
               47.1),
    min_temp = c(12.4, 14.3, 15.1, 16.1, 12.7, 16.1, 14.4,
                 15.1, 11.8, 9.5),
    temp_catog = structure(
      c(2L, 2L, 3L, 3L,
        2L, 3L, 2L, 3L, 2L, 2L),
      .Label = c("T1(<=8)", "T2(>8, <=15)",
                 "T3(>15)"),
      class = "factor"
    ),
    humidity_catog = structure(
      c(1L,
        1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L),
      .Label = c("RH1(<=65)",
                 "RH2(>65)"),
      class = "factor"
    )
  ),
  class = c("grouped_df",
            "tbl_df", "tbl", "data.frame"),
  row.names = c(NA,-10L),
  groups = structure(
    list(
      year = structure(
        1L,
        .Label = c(
          "2006",
          "2007",
          "2012",
          "2013",
          "2014",
          "2014_c",
          "2015_a",
          "2015_b",
          "2016",
          "2017",
          "2020"
        ),
        class = "factor"
      ),
      .rows = structure(
        list(1:10),
        ptype = integer(0),
        class = c("vctrs_list_of",
                  "vctrs_vctr", "list")
      )
    ),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA,-1L),
    .drop = TRUE
  )
)

注意：我不想要独特的事件。我只想统计每个类别记录了多少次

Answer 1

不太确定OP如何合并两个汇总结果，但我们可以调用

mutate

而不是

summarise

，顺序地将分组变量提供给

.by

参数。

obs：玩具数据框按年份分组，我事先取消分组

library(dplyr) #requires dplyr 1.1.0 for the .by solution

df %>%
    ungroup() %>%
    mutate(rh_count = n(), .by = c(year, humidity_catog)) %>%
    mutate(temp_count = n(), .by = c(year, temp_catog))

# A tibble: 10 × 7
   year  min_rh min_temp temp_catog   humidity_catog rh_count temp_count
   <fct>  <dbl>    <dbl> <fct>        <fct>             <int>      <int>
 1 2006    47.9     12.4 T2(>8, <=15) RH1(<=65)             8          6
 2 2006    49       14.3 T2(>8, <=15) RH1(<=65)             8          6
 3 2006    44.7     15.1 T3(>15)      RH1(<=65)             8          4
 4 2006    40.2     16.1 T3(>15)      RH1(<=65)             8          4
 5 2006    50       12.7 T2(>8, <=15) RH1(<=65)             8          6
 6 2006    52.3     16.1 T3(>15)      RH1(<=65)             8          4
 7 2006    51.5     14.4 T2(>8, <=15) RH1(<=65)             8          6
 8 2006    82.8     15.1 T3(>15)      RH2(>65)              2          4
 9 2006    73.8     11.8 T2(>8, <=15) RH2(>65)              2          6
10 2006    47.1      9.5 T2(>8, <=15) RH1(<=65)             8          6

计算给定因素中每个级别有多少个值？

问题描述投票：0回答：1

1个回答

最新问题

计算给定因素中每个级别有多少个值？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1