我已经能够在 python 中创建以下可视化,但想在 r 中重新创建它。 数据可以在 r 代码中进一步找到。
我编写的用于从数据生成下图的Python代码是:
import matplotlib.pyplot as plt
import pandas as pd
# Load the data
portfolio_data = pd.read_excel("Data.xlsx")
# Define colors for each Therapeutic Area (TA)
ta_colors = {
'Malaria': 'lightblue',
'HIV': 'lightgreen',
# Additional colors can be added for other TAs if present in the dataset
}
# Define the width of the bars to adjust the diamond symbol position
bar_width = 0.8
plt.figure(figsize=(12, 8))
# For each phase, plot the projects, label them, color them by TA, add symbol for external funding, and draw border for NME type
for idx, phase in enumerate(portfolio_data['Phase'].unique()):
phase_data = portfolio_data[portfolio_data['Phase'] == phase]
bottom_offset = 0
for index, row in phase_data.iterrows():
edge_color = 'black' if row['Type'] == 'NME' else None # Add border if project type is NME
plt.bar(idx, 1, bottom=bottom_offset, color=ta_colors[row['TA']], edgecolor=edge_color, linewidth=1.2)
plt.text(idx, bottom_offset + 0.5, row['Project'], ha='center', va='center', fontsize=10)
# Add diamond symbol next to projects with external funding, positioned on the right border of the bar
if row['Funding'] == 'External':
plt.text(idx + bar_width/2, bottom_offset + 0.5, u'\u25C6', ha='right', va='center', fontsize=10, color='red')
bottom_offset += 1
# Adjust x-ticks to match phase names
plt.xticks(range(len(portfolio_data['Phase'].unique())), portfolio_data['Phase'].unique())
# Create legends for the TAs and external funding separately
legend_handles_ta = [plt.Rectangle((0, 0), 1, 1, color=ta_colors[ta], label = ta) for ta in ta_colors.keys() ]
legend_external_funding = [plt.Line2D([0], [0], marker='D', color='red', markersize=10, label='External Funding', linestyle='None')]
legend_nme = [plt.Rectangle((0, 0), 1, 1, edgecolor='black', facecolor='none', linewidth=1.2, label='NME Type')]
# Add legends to the plot
legend1 = plt.legend(handles=legend_handles_ta, title="Therapeutic Area (TA)", loc='upper left')
plt.gca().add_artist(legend1)
legend2 = plt.legend(handles=legend_external_funding, loc='upper right')
plt.gca().add_artist(legend2)
plt.legend(handles=legend_nme, loc='upper center')
plt.title('Number of Projects by Phase, Colored by TA, with Symbol on Bar Border for External Funding and Border for NME Type')
plt.xlabel('Phase')
plt.ylabel('Number of Projects')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
在尝试复制 r 中的输出时,我尝试了以下代码:
library(ggplot2)
library(dplyr)
portfolio_data <- read.table(text = "Project Phase Funding TA Type
Project1 I Internal Malaria NME
Project2 I Internal Malaria NME
Project3 I Internal Malaria NME
Project4 I External HIV NME
Project5 I Internal HIV NME
Project10 II Internal Malaria NME
Project11 II Internal Malaria NME
Project12 II Internal Malaria NME
Project17 II External Malaria LCM
Project18 II External HIV LCM
Project19 II Internal HIV LCM
Project20 III External Malaria NME
Project21 III Internal Malaria NME
Project22 III External Malaria LCM
Project23 III Internal HIV LCM
Project24 III External HIV NME
Project25 III Internal Malaria LCM
Project26 III External HIV LCM
Project27 III Internal HIV NME
", header=TRUE)
portfolio_data <- portfolio_data %>%
mutate(dummy = 1)
ta_colors <- c(
Malaria = "lightblue",
HIV = "lightgreen"
)
type_colors <- c(
NME = "black",
LCM = "white"
)
# Create the plot
plot <- ggplot(portfolio_data, aes(x = Phase, y = dummy, fill = TA, label = Project)) +
geom_col() +
#add project name as labels
geom_text(aes(label = Project)
, position = position_stack(vjust = .5)) +
#add borders by Type
geom_col(aes(color = Type)
, fill = NA
, size = 1) +
#add colors for TA and Type
scale_fill_manual(values = ta_colors) +
scale_color_manual(values = type_colors) +
#diamonds for projects with external funding
geom_text(aes(label = if_else(Funding == "External", "\u25C6", NA))
, vjust = 0.5, hjust = -6.8, color = "red", size = 5
, position = position_stack(vjust = .5)) +
# Theme and labels
labs(title = "Number of Projects by Phase, Colored by TA, with Symbol on Bar Border for External Funding and Border for NME Type",
x = "Phase",
y = "Number of Projects") +
theme_minimal()
print(plot)
问题是边界不正确。例如,Project 24 是一个 NME 项目。似乎第二个 geom_col() 调用重新排序了项目,以便不再维护项目和类型之间的链接。有没有解决的办法?我想使用内置功能来绘制边框,但也许我应该考虑添加一个单独的图层,并在标签周围添加方框?我也尝试过 geom_bar() 但没有成功。也许还有更好的方法。任何帮助表示赞赏。
主要问题是分组。使用
position_stack
时,堆栈的顺序由 group
aes 确定。如果没有明确设置,ggplot2
将根据映射到其他美学的分类变量推断或设置group
,例如在您的情况下,分组是根据 fill
、color
和 label
设置的。此外,每个层都有自己的(默认)分组,例如如果是第二个 geom_col
,您可以在设置 fill
时按 fill=NA
删除分组。因此,您会对该层进行不同的分组。
因此,特别是在像您这样的复杂绘图(涉及多个几何图形)的情况下,默认分组并不总能给出您想要的结果。相反,您必须明确设置它。在您的情况下,堆栈应由
Project
确定并排序,即将 group = Project
添加到 aes()
。
除此之外,我还做了一些额外的调整。首先,我使用
position_stack(..., reverse = TRUE)
反转了堆栈的顺序。其次,我将 "transparent"
类型的轮廓颜色设置为 "LCM"
。第三,我切换到 geom_point
添加允许使用 shape
aes 的菱形,并相应地获得第三个(形状)图例,如 python 图中所示。最后,我通过 theme()
和 guides()
调整了图例。
library(ggplot2)
type_colors <- c(
NME = "black",
LCM = "transparent"
)
ps <- position_stack(vjust = .5, reverse = TRUE)
ggplot(
portfolio_data,
aes(x = Phase, y = dummy, group = Project)
) +
geom_col(aes(fill = TA), position = ps) +
geom_col(aes(color = Type),
fill = NA,
linewidth = 1, position = ps
) +
geom_text(aes(label = Project), position = ps) +
geom_point(
aes(
x = as.numeric(factor(Phase)) + .35,
shape = Funding == "External"
),
color = "red", size = 5,
position = ps
) +
scale_shape_manual(
values = c(18, NA),
labels = "External",
breaks = "TRUE"
) +
scale_fill_manual(
values = ta_colors
) +
scale_color_manual(
values = type_colors,
breaks = "NME"
) +
# Theme and labels
labs(
title = "Number of Projects by Phase, Colored by TA, with Symbol on Bar Border for External Funding and Border for NME Type",
x = "Phase",
y = "Number of Projects",
shape = "Funding"
) +
theme_minimal() +
theme(
legend.position = "top",
legend.direction = "vertical"
) +
guides(
color = guide_legend(title.position = "top", order = 2),
fill = guide_legend(title.position = "top", order = 1),
shape = guide_legend(title.position = "top", order = 3)
)