group_by和count在R中满足条件的行数

问题描述 投票:1回答:1

我有一个如下数据表:

city         year    temp
Seattle      2019    82 
Seattle      2018    10 
NYC          2010    78 
DC           2011    71 
DC           2011    10 
DC           2018    60 

我想通过cityyear对它们进行分组,并创建一个新的表格,例如表明西雅图的温度在10到20之间多少年,20到30之间可能有多少年,等等。

我怎样才能做到这一点?

r group-by condition
1个回答
1
投票

我们可以使用cuttemp分配到箱子中并通过citytemp_range进行总结:

library(dplyr)

df %>%
  mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
  group_by(city, temp_range) %>%
  summarize(years = n_distinct(year))

输出:

# A tibble: 6 x 3
# Groups:   city [3]
  city    temp_range years
  <fct>   <fct>      <int>
1 DC      (0,10]         1
2 DC      (50,60]        1
3 DC      (70,80]        1
4 NYC     (70,80]        1
5 Seattle (0,10]         1
6 Seattle (80,90]        1

使用dplyr 0.8.0,我们还可以通过在.drop中将新的FALSE参数设置为group_by来保持空因子水平:

df %>%
  mutate(temp_range = cut(temp, breaks = seq(0, 100, 10))) %>%
  group_by(city, temp_range, .drop = FALSE) %>%
  summarize(years = n_distinct(year))

输出:

# A tibble: 30 x 3
# Groups:   city [3]
   city  temp_range years
   <fct> <fct>      <int>
 1 DC    (0,10]         1
 2 DC    (10,20]        0
 3 DC    (20,30]        0
 4 DC    (30,40]        0
 5 DC    (40,50]        0
 6 DC    (50,60]        1
 7 DC    (60,70]        0
 8 DC    (70,80]        1
 9 DC    (80,90]        0
10 DC    (90,100]       0
# ... with 20 more rows

数据:

df <- structure(list(city = structure(c(3L, 3L, 2L, 1L, 1L, 1L), .Label = c("DC", 
"NYC", "Seattle"), class = "factor"), year = c(2019L, 2018L, 
2010L, 2011L, 2011L, 2018L), temp = c(82L, 10L, 78L, 71L, 10L, 
60L)), class = "data.frame", row.names = c(NA, -6L))
© www.soinside.com 2019 - 2024. All rights reserved.