我来这里是因为经过数小时的研究和失败的尝试,我不知道下一步该怎么做。
我有一个数据库(使用open_excel命令通过dyplr
打开),看起来像下面的数据库(但更复杂,有更多变量):
> dput(open)
structure(list(Subject = c(1, 2, 3, 4, 5), `Happy - Before` = c(4,
4, 2, 1, 7), `Courageous - Before` = c(5, 2, 1, 3, 4), `Strange - Before` = c(1,
2, 1, 4, 6), `Happy - After` = c(4, 2, 6, 2, 2), `Courageous - After` = c(7,
1, 5, 1, 2), `Strange - After` = c(3, 7, 4, 5, 4)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 5 x 7
Subject `Happy - Before` `Courageous - B… `Strange - Befo… `Happy - After`
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 4 5 1 4
2 2 4 2 2 2
3 3 2 1 1 6
4 4 1 3 4 2
5 5 7 4 6 2
# … with 2 more variables: `Courageous - After` <dbl>, `Strange - After` <dbl>
我的目标是绘制具有某些特异性的密度图:
Density of scores obtained by all the subjects for each ability trait on a scale from 1 to 7
您可以在我的(糟糕的)图表上看到,我试图通过密度图以1到7(x轴)的比例显示所有主题的响应,但是对于每个特征我都有(y-轴),因此与[测试]之前和测试之后做出的响应分开。而且我需要获得相同的图例(在左边显示勇气,在右边显示不勇气)。参与者对标尺的回答越接近7,他就越[快乐,勇敢,沮丧,焦虑....]越接近1,他就越[不快乐,不高兴勇敢,不沮丧...]我尽力了(使用ggplot2
模板,尝试融化所有东西,但我对R和语言编程非常陌生:/)我所有的变量都具有这样的名称:[trait1]_before
,[trait2]_before
,[trait1]_After
,[trait2]_After
我希望这篇帖子清楚。如果没有,我将很高兴添加信息!谢谢大家(对不起我的英语不好)
密度图假定沿x轴有连续变量,而您的示例仅从1到7。这意味着您可以绘制密度超过0和7的尾巴,或者在这些点处强制截断。值。
data <- structure(list(Subject = c(1, 2, 3, 4, 5),
`Happy - Before` = c(4, 4, 2, 1, 7),
`Courageous - Before` = c(5, 2, 1, 3, 4),
`Strange - Before` = c(1, 2, 1, 4, 6),
`Happy - After` = c(4, 2, 6, 2, 2),
`Courageous - After` = c(7, 1, 5, 1, 2),
`Strange - After` = c(3, 7, 4, 5, 4)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyverse)
library(ggplot2)
library(ggridges)
library(grid)
library(gtable)
dataPivot <- data %>%
pivot_longer(-Subject, names_to = "measure", values_to = "score") %>%
mutate(status = sub(".* - ", "", measure),
feature = sub(" - .*", "", measure),
featureOpposite = paste('Not', feature)) %>%
mutate_if(is.character, as.factor)
如果不想截断图:
# Create the first plot with the axis on the left
p1 <- ggplot(dataTest, aes(x = score, y = feature)) +
geom_density_ridges2(aes(fill = status), scale = 0.7, alpha = .3) +
scale_x_continuous(breaks = c(1, 7)) +
labs(y = NULL) +
theme_ridges() +
theme(legend.position="bottom")
# Create a second plot with the legend on the right
p2 <- ggplot(dataTest, aes(x = score, y = featureOpposite)) +
geom_density_ridges2(aes(fill = status), scale = 0.7, alpha = .3) +
scale_y_discrete(position = "right") +
theme_ridges() +
theme(legend.position="bottom")
# Convert both plots to gtables
g1 <- ggplot_gtable(ggplot_build(p1))
g2 <- ggplot_gtable(ggplot_build(p2))
# Add an empty column to the left side of the first plot to make room for the right
# axis
g1 <- gtable_add_cols(g1, widths = unit(0.2, "null"), pos = -1)
# overlap the panel of the 2nd plot on that of the 1st plot
pp <- c(subset(g1$layout, name == "axis-r", se = t:r))
# Add the y-axis from the second plot
g <- gtable_add_grob(g1, g2$grobs[[which(g2$layout$name == "axis-r")]], pp$t, pp$r,
pp$b, pp$r)
grid.draw(g)
如果要截断两端:
# Here we are just adding in a height variable, changing stat to density and adding
# trim = T
p1 <- ggplot(dataTest, aes(x = score, y = feature, height = ..density..)) +
geom_density_ridges2(aes(fill = status), scale = 0.7, alpha = .3, stat = "density",
trim = TRUE) +
scale_x_continuous(breaks = c(1, 7)) +
labs(y = NULL) +
theme_ridges() +
theme(legend.position="bottom")
p2 <- ggplot(dataTest, aes(x = score, y = featureOpposite, height = ..density..)) +
geom_density_ridges2(aes(fill = status), scale = 0.7, alpha = .3, stat = "density",
trim = TRUE) +
scale_y_discrete(position = "right") +
theme_ridges() +
theme(legend.position="bottom")
g1 <- ggplot_gtable(ggplot_build(p1))
g2 <- ggplot_gtable(ggplot_build(p2))
g1 <- gtable_add_cols(g1, widths = unit(0.2, "null"), pos = -1)
## overlap the panel of the 2nd plot on that of the 1st plot
pp <- c(subset(g1$layout, name=="axis-r", se=t:r))
g <- gtable_add_grob(g1, g2$grobs[[which(g2$layout$name=="axis-r")]], pp$t, pp$r,
pp$b, pp$r)
grid.draw(g)
我想知道(也由@Amanda暗示)密度图是否适合此类数据,但是将其放在一边,这是一个较短的版本,它不会尝试像@Amanda那样处理数据。
您可以使用bw
来尝试不同的带宽选择方法(默认情况下,对于高斯分布,默认情况下不建议使用nrd0
,因为它倾向于过度平滑)。
library(tidyverse)
open_long <- open %>%
pivot_longer(-Subject, names_to = c("state", "time"), names_pattern = "([A-Za-z]+) - ([A-Za-z]+)")
ggplot(open_long, aes(x = value, fill = time)) +
geom_density(alpha = 0.5, bw = "SJ") +
facet_wrap(~state, ncol = 1) +
theme_bw()
由reprex package(v0.3.0)在2019-12-20创建