我正在尝试在 ggplot 中为多项逻辑回归绘制一个图。并非在每个因子水平中都观察到了我的名义因变量的所有水平。我想要一个条形宽度均匀的图。一旦我使用
position_dodge(preserve='single')
代码,我就可以使用具有均匀宽度条的 geom_bar 来显示每个因素的平均值,但我无法让 geom_point
对齐相同的值。
这是我的数据,决定是名义因变量:
decide=c("h", "g", "h", "g", "h", "g", "g", "h", "g", "h", "g", "h", "h", "h", "h", "h", "g", "h", "h", "r", "g", "h", "h", "h", "g", "g", "g", "h", "h", "h","h", "h", "h", "r", "h", "g", "g", "h", "g", "h", "g", "h", "g", "h", "d", "h", "h", "r", "h", "h", "g", "g", "g", "h", "g", "g", "g", "g", "h", "h")
dcsz=c("small", "medium", "small", "small", "medium", "small", "small", "medium", "medium", "small", "small", "medium", "small", "medium", "small", "medium", "small", "medium", "small", "small", "medium", "small", "medium", "medium", "medium", "small", "small", "medium", "small", "medium", "small", "medium", "small", "medium", "medium", "medium", "small", "medium", "medium", "small", "medium", "small", "medium", "medium", "small", "small", "medium", "small", "medium", "medium", "medium", "small", "small", "small", "small", "medium", "medium", "small", "small", "medium")
disthome=c(9.2,10.0,5.0,0.8,6.5,2.0,6.8,1.6,6.9,4.4,5.8,6.2,4.7,0.6,3.0,4.7,5.8,1.5,5.8,4.5,3.2,4.6,2.9,4.1,6.5,4.8,9.1,4.7,4.3,4.2,4.8,3.5,5.4,7.1,3.0,5.3,1.0,5.2,2.2,1.7,6.0,6.1,3.1,2.4,4.3,5.1,7.2,9.8,6.9,3.1,8.8,0.9,9.7,2.2,5.4,4.4,6.8,8.3,5.4,2.2)
gohome=data.frame(decide, dcsz, disthome)
这是我获得平均值和标准误差的方法:
gohome.disthome <- gohome %>%
group_by(dcsz,decide) %>%
summarise(meandisthome = mean(na.omit(disthome)),
sedisthome=sd(na.omit(disthome))/sqrt(n()))
现在来说说细节: 这是我在设法将误差线与均值线对齐并将点分成标称变量之前的原始代码:
ggplot(gohome,aes(y=disthome, x=dcsz, fill = decide)) +
#add bars and the preserve part keeps all bars same width
geom_bar(stat="identity", position=position_dodge(),
data=gohome.disthome,aes(x=dcsz,y=meandisthome))
#overlay data points
geom_point(position=position_dodge()) +
#add error bars of means
geom_errorbar(data=gohome.disthome,stat="Identity",
position=position_dodge(),
aes(x=dcsz, fill = decide,y=meandisthome,
ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome),
width=0.3)+
#flip axis
coord_flip()
这里是我让误差线与平均线对齐的代码(在
position_dodge
中使用0.9),将点分成标称变量(0.9),并且还使误差线和平均线都相同即使因变量的水平并未在每个因子水平中全部观察到(我在 preserve="single"
中添加了 position_dodge
)。我无法将 preserve='single'
添加到 geom_point
中,否则它不会通过标称变量分隔点,并且使用 preserve='total'
也不会执行任何操作:
ggplot(gohome,aes(y=disthome, x=dcsz, fill = decide)) +
#add bars and the preserve part keeps all bars same width
geom_bar(stat="identity",position=position_dodge(preserve='single'),
data=gohome.disthome,aes(x=dcsz,y=meandisthome))+
#overlay data points
geom_point(position=position_dodge(0.9)) +
#add error bars of means
geom_errorbar(data=gohome.disthome,stat="Identity",
position=position_dodge(0.9,preserve = "single"),
aes(x=dcsz, fill = decide,y=meandisthome,
ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome),
width=0.3)+
#flip axis
coord_flip()
我还尝试使用
position_dodge2
而不是 position_dodge
来实现不同的组合和 preserve='total'
,但这也不能解决问题。要么这些点保持不变,要么它们变得完全分散,没有分离。我想使用以下链接中的 position_dodge2
和 preserve='total'
,因为我的问题非常相似(不确定为什么我的问题不起作用):https://github.com/tidyverse/ggplot2/issues/2712
有人可以帮我修复我的代码吗?我需要点来完美地排列所有误差线。
问题是您错过了在
geom_errobar
和 geom_point
中设置分组变量。来自文档:
position_dodge() 需要在 global 或 geom_* 层中指定分组变量。
试试这个:
library(dplyr)
library(ggplot2)
ggplot(gohome,aes(y=disthome, x=dcsz)) +
#add bars and the preserve part keeps all bars same width
geom_bar(stat="identity",
position=position_dodge(),
data=gohome.disthome,
aes(x=dcsz, y=meandisthome, fill = decide)) +
#overlay data points
geom_point(aes(group = decide), position=position_dodge(width = 0.9)) +
#add error bars of means
geom_errorbar(data=gohome.disthome,stat="Identity",
position=position_dodge(width = 0.9),
aes(x=dcsz,
group = decide,
y=meandisthome,ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome), width = 0.5)+
#flip axis
coord_flip()
编辑经过大量谷歌搜索并检查了几个组合后,我能想到的获得相同宽度的条形的最佳解决方案是简单地使用
tidyr::complete(decide, dcsz)
填充数据框。
gohome <- data.frame(decide,dcsz,disthome) %>%
tidyr::complete(decide, dcsz)
gohome.disthome <- gohome %>% group_by(dcsz,decide) %>%
summarise(meandisthome = mean(na.omit(disthome)), sedisthome=sd(na.omit(disthome))/sqrt(n()))
#> `summarise()` regrouping output by 'dcsz' (override with `.groups` argument)
ggplot(gohome,aes(y=disthome, x=dcsz)) +
#add bars and the preserve part keeps all bars same width
geom_bar(stat="identity",
position=position_dodge(),
data=gohome.disthome,
aes(x=dcsz, y=meandisthome, fill = decide)) +
#overlay data points
geom_point(aes(group = decide), position=position_dodge(width = 0.9)) +
#add error bars of means
geom_errorbar(data=gohome.disthome,stat="Identity",
position=position_dodge(width = 0.9),
aes(x=dcsz,
group = decide,
y=meandisthome,ymin=meandisthome-sedisthome,ymax=meandisthome+sedisthome), width = 0.5)+
#flip axis
coord_flip()
由 reprex 包于 2020-06-29 创建(v0.3.0)
躲避可能会很痛苦。考虑到您的用例,并假设您没有将构面用于其他任何用途,那么使用它们可能会更简单:
ggplot(gohome,
aes(x = decide, y = disthome)) +
stat_summary(geom = "bar", fun = "mean",
aes(fill = decide),
width = 1) +
geom_point() +
stat_summary(geom = "errorbar") + # default summary function is mean_se()
facet_grid(forcats::fct_rev(dcsz) ~ ., switch = "y") +
coord_flip() +
# optional: aesthetic changes to imitate the original look
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
panel.spacing = unit(0, "pt"),
strip.background = element_blank(),
strip.text.y.left = element_text(angle = 0))
(请注意,我也没有使用摘要数据框,因为 ggplot2 中的摘要统计数据就足够了。)