如何使用带有标签、填充、颜色和符号的 ggplot 来可视化堆叠条形图中的不同维度?

问题描述 投票:0回答:1

我已经能够在 python 中创建以下可视化,但想在 r 中重新创建它。 数据可以在 r 代码中进一步找到。

我编写的用于从数据生成下图的Python代码是:

import matplotlib.pyplot as plt
import pandas as pd

# Load the data
portfolio_data = pd.read_excel("Data.xlsx")

# Define colors for each Therapeutic Area (TA)
ta_colors = {
    'Malaria': 'lightblue',
    'HIV': 'lightgreen',
    # Additional colors can be added for other TAs if present in the dataset
}

# Define the width of the bars to adjust the diamond symbol position
bar_width = 0.8

plt.figure(figsize=(12, 8))

# For each phase, plot the projects, label them, color them by TA, add symbol for external funding, and draw border for NME type
for idx, phase in enumerate(portfolio_data['Phase'].unique()):
    phase_data = portfolio_data[portfolio_data['Phase'] == phase]
    
    bottom_offset = 0
    for index, row in phase_data.iterrows():
        edge_color = 'black' if row['Type'] == 'NME' else None  # Add border if project type is NME
        plt.bar(idx, 1, bottom=bottom_offset, color=ta_colors[row['TA']], edgecolor=edge_color, linewidth=1.2)
        plt.text(idx, bottom_offset + 0.5, row['Project'], ha='center', va='center', fontsize=10)
        
        # Add diamond symbol next to projects with external funding, positioned on the right border of the bar
        if row['Funding'] == 'External':
            plt.text(idx + bar_width/2, bottom_offset + 0.5, u'\u25C6', ha='right', va='center', fontsize=10, color='red')
        
        bottom_offset += 1

# Adjust x-ticks to match phase names
plt.xticks(range(len(portfolio_data['Phase'].unique())), portfolio_data['Phase'].unique())

# Create legends for the TAs and external funding separately
legend_handles_ta = [plt.Rectangle((0, 0), 1, 1, color=ta_colors[ta], label = ta) for ta in ta_colors.keys() ]
legend_external_funding = [plt.Line2D([0], [0], marker='D', color='red', markersize=10, label='External Funding', linestyle='None')]
legend_nme = [plt.Rectangle((0, 0), 1, 1, edgecolor='black', facecolor='none', linewidth=1.2, label='NME Type')]

# Add legends to the plot
legend1 = plt.legend(handles=legend_handles_ta, title="Therapeutic Area (TA)", loc='upper left')
plt.gca().add_artist(legend1)
legend2 = plt.legend(handles=legend_external_funding, loc='upper right')
plt.gca().add_artist(legend2)
plt.legend(handles=legend_nme, loc='upper center')

plt.title('Number of Projects by Phase, Colored by TA, with Symbol on Bar Border for External Funding and Border for NME Type')
plt.xlabel('Phase')
plt.ylabel('Number of Projects')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

结果如下:enter image description here

在尝试复制 r 中的输出时,我尝试了以下代码:

library(ggplot2)
library(dplyr)

portfolio_data <- read.table(text = "Project    Phase   Funding TA  Type
Project1    I   Internal    Malaria NME
Project2    I   Internal    Malaria NME
Project3    I   Internal    Malaria NME
Project4    I   External    HIV NME
Project5    I   Internal    HIV NME
Project10   II  Internal    Malaria NME
Project11   II  Internal    Malaria NME
Project12   II  Internal    Malaria NME
Project17   II  External    Malaria LCM
Project18   II  External    HIV LCM
Project19   II  Internal    HIV LCM
Project20   III External    Malaria NME
Project21   III Internal    Malaria NME
Project22   III External    Malaria LCM
Project23   III Internal    HIV LCM
Project24   III External    HIV NME
Project25   III Internal    Malaria LCM
Project26   III External    HIV LCM
Project27   III Internal    HIV NME
", header=TRUE)

portfolio_data <- portfolio_data %>%
  mutate(dummy = 1)


ta_colors <- c(
  Malaria = "lightblue",
  HIV = "lightgreen"
)

type_colors <- c(
  NME = "black",
  LCM = "white"
)

# Create the plot
plot <- ggplot(portfolio_data, aes(x = Phase, y = dummy, fill = TA, label = Project)) +
  
  geom_col() +

  #add project name as labels
  geom_text(aes(label = Project)
            , position = position_stack(vjust = .5)) +
  
  #add borders by Type
  geom_col(aes(color = Type)
           , fill = NA
           , size = 1) +
  
  #add colors for TA and Type
  scale_fill_manual(values = ta_colors) +
  scale_color_manual(values = type_colors) +
  
  #diamonds for projects with external funding
  geom_text(aes(label = if_else(Funding == "External", "\u25C6", NA))
            , vjust = 0.5, hjust = -6.8, color = "red", size = 5
            , position = position_stack(vjust = .5)) +
  
  # Theme and labels
  labs(title = "Number of Projects by Phase, Colored by TA, with Symbol on Bar Border for External Funding and Border for NME Type",
       x = "Phase", 
       y = "Number of Projects") +
  theme_minimal()

print(plot)

我得到了以下结果: enter image description here

问题是边界不正确。例如,Project 24 是一个 NME 项目。似乎第二个 geom_col() 调用重新排序了项目,以便不再维护项目和类型之间的链接。有没有解决的办法?我想使用内置功能来绘制边框,但也许我应该考虑添加一个单独的图层,并在标签周围添加方框?我也尝试过 geom_bar() 但没有成功。也许还有更好的方法。任何帮助表示赞赏。

r ggplot2 label geom-text geom-col
1个回答
0
投票

主要问题是分组。使用

position_stack
时,堆栈的顺序由
group
aes 确定。如果没有明确设置,
ggplot2
将根据映射到其他美学的分类变量推断或设置
group
,例如在您的情况下,分组是根据
fill
color
label
设置的。此外,每个层都有自己的(默认)分组,例如如果是第二个
geom_col
,您可以在设置
fill
时按
fill=NA
删除分组。因此,您会对该层进行不同的分组。

因此,特别是在像您这样的复杂绘图(涉及多个几何图形)的情况下,默认分组并不总能给出您想要的结果。相反,您必须明确设置它。在您的情况下,堆栈应由

Project
确定并排序,即将
group = Project
添加到
aes()

除此之外,我还做了一些额外的调整。首先,我使用

position_stack(..., reverse = TRUE)
反转了堆栈的顺序。其次,我将
"transparent"
类型的轮廓颜色设置为
"LCM"
。第三,我切换到
geom_point
添加允许使用
shape
aes 的菱形,并相应地获得第三个(形状)图例,如 python 图中所示。最后,我通过
theme()
guides()
调整了图例。

library(ggplot2)

type_colors <- c(
  NME = "black",
  LCM = "transparent"
)

ps <- position_stack(vjust = .5, reverse = TRUE)

ggplot(
  portfolio_data,
  aes(x = Phase, y = dummy, group = Project)
) +
  geom_col(aes(fill = TA), position = ps) +
  geom_col(aes(color = Type),
    fill = NA,
    linewidth = 1, position = ps
  ) +
  geom_text(aes(label = Project), position = ps) +
  geom_point(
    aes(
      x = as.numeric(factor(Phase)) + .35,
      shape = Funding == "External"
    ),
    color = "red", size = 5,
    position = ps
  ) +
  scale_shape_manual(
    values = c(18, NA),
    labels = "External",
    breaks = "TRUE"
  ) +
  scale_fill_manual(
    values = ta_colors
  ) +
  scale_color_manual(
    values = type_colors,
    breaks = "NME"
  ) +
  # Theme and labels
  labs(
    title = "Number of Projects by Phase, Colored by TA, with Symbol on Bar Border for External Funding and Border for NME Type",
    x = "Phase",
    y = "Number of Projects",
    shape = "Funding"
  ) +
  theme_minimal() +
  theme(
    legend.position = "top",
    legend.direction = "vertical"
  ) +
  guides(
    color = guide_legend(title.position = "top", order = 2),
    fill = guide_legend(title.position = "top", order = 1),
    shape = guide_legend(title.position = "top", order = 3)
  )

© www.soinside.com 2019 - 2024. All rights reserved.