如何在 Alluvial/Sankey 图（在 R ggalluvial 上）的流量项上添加值标签？

Question

我希望在 R 上标记冲积/桑基图的“流量”部分。

可以轻松标记层（列），但不能标记连接它们的流。我所有阅读文档和实验的尝试都没有成功。

在下面的示例中，“freq”预计会标注在流量连接部分。

chart

library(ggplot2)
library(ggalluvial)

data(vaccinations)
levels(vaccinations$response) <- rev(levels(vaccinations$response))
ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = freq)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  theme(legend.position = "bottom") +
  ggtitle("vaccination survey responses at three points in time")

Answer 1

有一个选项可以获取原始数字并将其用作流程部分的标签：

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = freq)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  geom_text(stat = "flow", nudge_x = 0.2) +
  theme(legend.position = "bottom") +
  ggtitle("vaccination survey responses at three points in time")

如果您想更好地控制如何标记这些点，您可以提取图层数据并对其进行计算。例如，我们可以仅计算起始位置的分数，如下所示：

# Assume 'g' is the previous plot object saved under a variable
newdat <- layer_data(g)
newdat <- newdat[newdat$side == "start", ]
split <- split(newdat, interaction(newdat$stratum, newdat$x))
split <- lapply(split, function(dat) {
  dat$label <- dat$label / sum(dat$label)
  dat
})
newdat <- do.call(rbind, split)

ggplot(vaccinations,
       aes(x = survey, stratum = response, alluvium = subject,
           y = freq,
           fill = response, label = freq)) +
  scale_x_discrete(expand = c(.1, .1)) +
  geom_flow() +
  geom_stratum(alpha = .5) +
  geom_text(stat = "stratum", size = 3) +
  geom_text(data = newdat, aes(x = xmin + 0.4, y = y, label = format(label, digits = 1)),
            inherit.aes = FALSE) +
  theme(legend.position = "bottom") +
  ggtitle("vaccination survey responses at three points in time")

这仍然是关于你到底想把标签放在哪里的判断。一开始就这样做是简单的方法，但如果您希望这些标签大约位于中间并相互躲避，则需要进行一些处理。

Answer 2

我在类似的数据库上运行代码，并在同一调查中的所有流程中遇到重复值。由于患者反应各不相同，因此我的数据库中的所有频率始终为 1。这是否会导致问题？此外，我的调查中可能存在缺失值，这也可能是问题所在。最初，我标记了“响应”，这导致响应名称出现重复值：缺失、始终等。将标签更改为“频率”或“百分比”并没有解决问题，因为每个流程都显示每个调查的重复值。

如何在 Alluvial/Sankey 图（在 R ggalluvial 上）的流量项上添加值标签？

问题描述投票：0回答：2

2个回答

最新问题

如何在 Alluvial/Sankey 图（在 R ggalluvial 上）的流量项上添加值标签？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2