我正在尝试绘制以下图,该图是一个直方图,其中有两个组的多个条形图。
数据集有点大,但我添加了一个只有 200 行的样本
structure(list(total_visits_SHS = structure(c(4, 2, NA, NA, 2,
4, 6, 3, 3, 1, 12, NA, 3, NA, 12, 2, 2, 1, 2, NA, NA, 12, 3,
8, 3, NA, 1, 1, NA, 4, NA, 6, NA, NA, 2, 5, NA, NA, 15, 10, NA,
51, NA, 3, NA, 3, 1, 5, 6, 2, 8, 12, 50, 1, 4, 2, 2, 30, NA,
16, 2, 10, NA, 2, 5, 1, NA, 10, 3, NA, 24, 1, 7, 10, 5, NA, 10,
2, 1, 20, 1, NA, 1, 2, 1, NA, 3, 1, 2, 3, 1, 20, 6, 11, 4, 1,
4, 2, 5, 24, 8, 2, NA, NA, 2, 1, 12, 30, NA, NA, 10, NA, 3, 1,
4, 2, NA, 6, NA, 7, 50, 60, NA, 1, 1, 6, 7, NA, 4, 2, NA, 6,
NA, 3, 3, 4, 10, 1, 6, 5, NA, 10, 1, NA, 1, 1, NA, 3, 12, 40,
1, 3, 6, 4, 3, 1, 2, 24, NA, NA, NA, 10, 12, 2, 1, 2, 2, 1, 1,
3, 18, 1, 4, 8, 4, 15, 4, 2, NA, 3, 20, NA, NA, NA, 3, 4, 2,
2, 2, 2, 2, 1, 1, NA, NA, 16, 1, 1, 7, NA), label = "number of medical consultation (last 12 months)", format.stata = "%9.0g"),
healthy = structure(c(0, 1, 1, 1, 0, 1, 1, 0, 1, 1, NA, 0,
0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1,
1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1,
0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,
0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0,
1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1,
1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1,
1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1,
1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1), label = "Has no health condition", format.stata = "%9.0g"),
total_visits_SCI = structure(c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), label = "SwiSCI - Total healthcare vistis", format.stata = "%9.0g"),
total_visits = c(4, 2, 0, 0, 2, 4, 6, 3, 3, 1, 12, 0, 3,
0, 12, 2, 2, 1, 2, 0, 0, 12, 3, 8, 3, 0, 1, 1, 0, 4, 0, 6,
0, 0, 2, 5, 0, 0, 15, 10, 0, 51, 0, 3, 0, 3, 1, 5, 6, 2,
8, 12, 50, 1, 4, 2, 2, 30, 0, 16, 2, 10, 0, 2, 5, 1, 0, 10,
3, 0, 24, 1, 7, 10, 5, 0, 10, 2, 1, 20, 1, 0, 1, 2, 1, 0,
3, 1, 2, 3, 1, 20, 6, 11, 4, 1, 4, 2, 5, 24, 8, 2, 0, 0,
2, 1, 12, 30, 0, 0, 10, 0, 3, 1, 4, 2, 0, 6, 0, 7, 50, 60,
0, 1, 1, 6, 7, 0, 4, 2, 0, 6, 0, 3, 3, 4, 10, 1, 6, 5, 0,
10, 1, 0, 1, 1, 0, 3, 12, 40, 1, 3, 6, 4, 3, 1, 2, 24, 0,
0, 0, 10, 12, 2, 1, 2, 2, 1, 1, 3, 18, 1, 4, 8, 4, 15, 4,
2, 0, 3, 20, 0, 0, 0, 3, 4, 2, 2, 2, 2, 2, 1, 1, 0, 0, 16,
1, 1, 7, 0)), row.names = c(NA, -200L), label = "TEL17_CH", class = c("tbl_df",
"tbl", "data.frame")
The code that I have generated is the following:
graph2|>filter(!is.na(healthy))|>
ggplot(aes(x=total_visits,fill=as.factor(healthy)))+
geom_histogram(aes(y = after_stat(count / sum(count))),
alpha=0.6,color="white", position = 'identity',
breaks = seq(0, 100, by = 1))+
scale_x_continuous(breaks = seq(0, 100, 10))+
scale_fill_manual(labels = c("TSCI", "SHS"), values = c("blue", "red"))+
labs(fill="")
如何更好地查看 x 轴上的值,特别是高值,在 ggplot 生成的图中看不到条形图。我创建的图是我用所有数据制作的。
我建议在值分布不明确、值大小不同的情况下打破 y 轴。它类似于放大以更好地可视化数据集中的低频值。一个可以处理破坏轴的包是ggbreak,你可以在它的教程中找到更多细节。在你的例子中,我使用了 y 值,发现最好的范围在 (0.04, 0.1) 之间。功能是
updated by adding a line before the last statement
:
library(tidyverse)
library(ggbreak)
graph2 |> filter(!is.na(healthy))|>
ggplot(aes(x=total_visits,fill=as.factor(healthy)))+
geom_histogram(aes(y = after_stat(count / sum(count))),
alpha=0.6,color="white", position = 'identity',
breaks = seq(0, 100, by = 1))+
scale_x_continuous(breaks = seq(0, 100, 10))+
scale_fill_manual(labels = c("TSCI", "SHS"), values = c("blue", "red"))+
scale_y_break(c(0.04 , 0.1), scales = .5) + theme_minimal() +
labs(fill="")
这是输出,但您可以根据需要使用断点。