使用皮马印第安人数据集。目标是根据每个特征绘制糖尿病(是或否)的图。然后在条形图中提供总数和百分比。
这是数据的头部:
> head(MASS::Pima.te, n = 10)
npreg glu bp skin bmi ped age type
1 6 148 72 35 33.6 0.627 50 Yes
2 1 85 66 29 26.6 0.351 31 No
3 1 89 66 23 28.1 0.167 21 No
4 3 78 50 32 31.0 0.248 26 Yes
5 2 197 70 45 30.5 0.158 53 Yes
6 5 166 72 19 25.8 0.587 51 Yes
7 0 118 84 47 45.8 0.551 31 Yes
8 1 103 30 38 43.3 0.183 33 No
9 3 126 88 41 39.3 0.704 27 No
10 9 119 80 35 29.0 0.263 29 Yes
数据为332行8列。
百分比部分一切进展顺利。两个错误:
MASS::Pima.te |>
dplyr::mutate(dplyr::across(-type, as.numeric)) |>
tidyr::pivot_longer(-type, names_to = "var", values_to = "value") |>
dplyr::summarise(value = sum(value), percentage = round(sum(value) / nrow(MASS::Pima.te), 2), .by = c(type, var)) |>
ggplot2::ggplot(ggplot2::aes(x = type, y = value)) +
ggplot2::geom_col() +
ggplot2::geom_text(
ggplot2::aes(label = value), vjust = -.2
) +
ggplot2::geom_text(
ggplot2::aes(label = paste0(percentage,"%"), vjust = 3, color = "white"
)) +
ggplot2::scale_y_continuous(expand = c(0, 0, .2, 0)) +
ggplot2::facet_wrap(~var, scales = "free") +
ggplot2::labs(title = "Numerical values against y")
col
而不是 color
并且它不应该在 aes()
里面。在计算百分比之前先总结是/否的总和
library(tidyverse)
MASS::Pima.te %>%
pivot_longer(!type) %>%
summarise(across(value, sum), .by = c(type, name)) %>%
mutate(perc = proportions(value), .by = c(name)) %>%
ggplot(aes(x = type, y = value)) +
geom_col() +
geom_text(aes(label = value),
vjust = -.5) +
geom_text(aes(label = scales::percent(perc),
vjust = 1.5),
col = "white") +
facet_wrap(~ name, scales = "free") +
scale_y_continuous(expand = expansion(mult = c(0.1, 0.25)))