我正在尝试从有关 COVID 19 病例的表格中获取具体值,并且我想找到每个月(有 4 个月)的最高病例数。
我创建了几个月的新列,然后重新组织了数据框,以便每个月的所有行均按案例降序排列,但是,当我尝试对月份使用不同的每月仅获取一行时,它确实如此没有什么。我是否使用了错误的功能?
# since you didn't provide a sample dataset, we download Our World in Data's COVID-19 dataset and transform it into something similar
country_date_cases <- read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/cases_deaths/new_cases.csv") |>
pivot_longer(-date, names_to = "Country.Region", values_to = "Cases") |>
filter(year(date) == 2020 & between(month(date), 1, 4))
date Country.Region Cases
<date> <chr> <dbl>
1 2020-01-03 World 0
2 2020-01-03 Afghanistan 0
3 2020-01-03 Africa 0
4 2020-01-03 Albania 0
5 2020-01-03 Algeria 0
6 2020-01-03 American Samoa 0
7 2020-01-03 Andorra 0
8 2020-01-03 Angola 0
9 2020-01-03 Anguilla 0
10 2020-01-03 Antigua and Barbuda 0
# ℹ 29,264 more rows
country_date_cases |>
mutate(Month = month(date)) |>
arrange(Month, desc(Cases)) |>
distinct(Country.Region, Month, .keep_all = TRUE) # just using Month gets the largest value for each month, which in my dataset, which includes the world total, is obviously the world total. If we include Country.Region in there too, we get the largest value for each country in each month.
# A tibble: 984 × 4
date Country.Region Cases Month
<date> <chr> <dbl> <dbl>
1 2020-01-31 World 2015 1
2 2020-01-31 Asia 2010 1
3 2020-01-31 Upper middle income 1990 1
4 2020-01-31 China 1984 1
5 2020-01-31 High income 20 1
6 2020-01-27 Thailand 7 1
7 2020-01-25 North America 5 1
8 2020-01-25 United States 5 1
9 2020-01-26 Europe 5 1
10 2020-01-26 European Union 5 1
# ℹ 974 more rows
# ℹ Use `print(n = ...)` to see more rows
# so this is equivalent to slice_max. However, with that said, as Seth has already commented, slice_max is the preferred option, as it is shorter, and more readable
country_date_cases |>
mutate(Month = month(date)) |>
slice_max(Cases, n = 1, by = c(Country.Region, Month), with_ties = FALSE) # with slice_max, we also have the option of including ties, which I didn't do here. If you wanted the Country.Region with the largest number of cases in each month, you could change this to `by = Month`