这是我的数据
df <- data.frame(
city = c("London", "Paris", "Rome", "Madrid", "Venice", "Bern"),
Spring = c(10, 3, 6, 9, 23, 8),
Summer = c(1, 5, 6, 4, 30, 12),
Fall = c(22, 24, 15, 4, 12, 8),
Winter = c(0, 12, 4, 22, 7, 9),
check.names=F
) %>%
janitor::adorn_totals(c("row")) %>%
janitor::adorn_percentages("col") %>%
janitor::adorn_pct_formatting(digits = 2) %>%
janitor::adorn_ns(position = "front")
我想在下面创建数据框,它只是显示每个季节访问量最大的两个城市是什么。
这是我尝试过的
semi_output <- df %>%
filter(city != 'Total')%>%
pivot_longer(cols = -city) %>%
group_by(name) %>%
slice_max(value, n = 2, with_ties = FALSE) %>%
unite(city, c("city", "value"), sep = '-')
它似乎可以捕获每个季节排名前 2 的城市(这很奇怪!这些单元格不是数字,它们包含诸如“(”和“%”之类的字符,这总是可靠的吗?R 是如何排序的他们?)
但我现在的主要问题是如何将此格式转换为我上面显示的所需输出?
如果您需要从after
adorn_
开始,那么您可以这样做:
library(tidyr)
library(dplyr)
library(purrr)
df |>
filter(city != "Total") |>
pivot_longer(cols = Spring:Winter, values_transform = readr::parse_number, names_to = "season") |>
slice_max(value, n = 2, by = season) |>
pmap(\(city, season, ...) tibble(!!season := paste0(city, "-", df[df$city == city, season]))) |>
bind_rows() |>
map(na.omit) |>
as.data.frame()