为数据帧的每一列选择前n个值(包括%),并在R中显示它们特定列的相对值

问题描述 投票:0回答:1

这是我的数据

df <- data.frame(
    city = c("London", "Paris", "Rome", "Madrid", "Venice", "Bern"),
    Spring = c(10, 3, 6, 9, 23, 8),
    Summer = c(1, 5, 6, 4, 30, 12),
    Fall = c(22, 24, 15, 4, 12, 8),
    Winter = c(0, 12, 4, 22, 7, 9),
    
    check.names=F
) %>%
    janitor::adorn_totals(c("row")) %>%
    janitor::adorn_percentages("col") %>%
    janitor::adorn_pct_formatting(digits = 2) %>%
    janitor::adorn_ns(position = "front")

我想在下面创建数据框,它只是显示每个季节访问量最大的两个城市是什么。

这是我尝试过的

  semi_output <- df %>%
  filter(city != 'Total')%>%
  pivot_longer(cols = -city) %>%
  group_by(name) %>%
  slice_max(value, n = 2, with_ties = FALSE) %>%
  unite(city, c("city", "value"), sep = '-')

它似乎可以捕获每个季节排名前 2 的城市(这很奇怪!这些单元格不是数字,它们包含诸如“(”和“%”之类的字符,这总是可靠的吗?R 是如何排序的他们?)

但我现在的主要问题是如何将此格式转换为我上面显示的所需输出?

r dataframe dplyr pivot-table
1个回答
0
投票

如果您需要从after

adorn_
开始,那么您可以这样做:

library(tidyr)
library(dplyr)
library(purrr)

df |>
  filter(city != "Total") |>
  pivot_longer(cols = Spring:Winter, values_transform = readr::parse_number, names_to = "season") |>
  slice_max(value, n = 2, by = season) |> 
  pmap(\(city, season, ...) tibble(!!season := paste0(city, "-", df[df$city == city, season]))) |>
  bind_rows() |>
  map(na.omit) |>
  as.data.frame()
© www.soinside.com 2019 - 2024. All rights reserved.