我有一个数据框,其结构和摘要如下所示:
> str(filtered_lymph)
tibble [108 × 5] (S3: tbl_df/tbl/data.frame)
$ cluster : chr [1:108] "CD4+ Tcells" "CD4+ Tcells" "CD4+ Tcells" "CD4+ Tcells" ...
$ condition : chr [1:108] "B16_NTX_1" "B16_NTX_2" "B16_NTX_3" "B16_NTX_4" ...
$ total cells : num [1:108] 1.706 1.415 2.01 0.858 1.264 ...
$ New_Condition : Factor w/ 3 levels "NTX","Anti-PD-1",..: 1 1 1 1 1 1 1 1 1 2 ...
$ Tissue_Expression: chr [1:108] "Lymphoid" "Lymphoid" "Lymphoid" "Lymphoid" ...
> summary(filtered_lymph)
cluster condition total cells
Length:108 Length:108 Min. : 0.3861
Class :character Class :character 1st Qu.: 2.6321
Mode :character Mode :character Median : 4.5800
Mean : 6.0848
3rd Qu.: 7.2591
Max. :23.2172
New_Condition Tissue_Expression
NTX :36 Length:108
Anti-PD-1:32 Class :character
AC484 :40 Mode :character
我正在尝试找到一种在控制和测试“New_Condition”变量之间进行双尾不配对 t 检验的方法 - 这将在“NTX”和 Anti-PD-1”之间进行,另一个在“NTX”和“AC484”之间进行'。我希望针对每个不同的细胞簇进行计算,即。 “CD4+ T 细胞”、“祖 T 细胞”等
然后我希望能够在 ggplot 上的相应箱线图对上方绘制 p 值 - 其代码如下所示:
p_lymph <- ggplot(filtered_lymph, aes(x = cluster, y = `total cells`, fill = New_Condition, color = New_Condition)) +
geom_boxplot(position = position_dodge(width = 0.8), size = 0.9,linetype="solid", outlier.shape=NA) +
scale_fill_manual(values=fill_colors) + #These two lines I am changing the legend label from New_Condition (as it is named in the original data column) to Treatment as stated on the original paper
scale_color_manual(name="Condition", values = outline_colors) +
theme_minimal()+
guides(fill="none",color="none")+ #removing the legend
labs(y="Total cells (%)",
x=NULL)+
scale_x_discrete(labels=x_lym_labels)+
theme(
axis.line=element_line(color="black",linewidth=0.5,linetype="solid"), #adding axis lines
axis.text.x=element_blank(),
axis.ticks.x=element_line(colour="black",linewidth=0.5),#manually adding in tick breaks
axis.ticks.y=element_line(colour="black", linewidth=0.5),
panel.grid=element_blank() #removing grid lines
)+
scale_y_continuous(breaks=seq(0,25, by=5),
limits=c(0,25), #Adjusting the y-axis breaks
labels=c("0","5","10","15","20","25") #customising the y-axis labels
)
我之前尝试过t.test功能:
t.test(filtered_lymph$'total cells',filtered_lymph$cluster[filtered_lymph$New_Condition])$p.value
但是出现了这个错误
Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In mean.default(y) : argument is not numeric or logical: returning NA
2: In var(y) : NAs introduced by coercion
任何帮助将不胜感激!
答案,正如评论中所暗示的那样,是使用类似
geom_signif
中的 ggsignif
之类的东西,但你必须改变情节的结构才能使其发挥作用:
library(ggplot2)
library(ggsignif)
ggplot(filtered_lymph, aes(New_condition, `total cells`,
color = New_condition, fill = after_scale(alpha(color, 0.5)))) +
geom_boxplot() +
geom_signif(comparisons = list(c('NTX', 'Anti-PD-1'),
c('NTX', 'Anti-CTLA-4')),
step_increase = 0.1, color = 'black',
map_signif_level = function(p) sprintf("p = %.2g", p)) +
scale_x_discrete('Cluster', expand = c(0.25, 0.5)) +
scale_color_manual("Condition", values = c("#4d4b4c", "#4c517b", "#c63a41")) +
facet_grid(.~cluster, switch = 'x') +
theme_minimal() +
theme(panel.spacing.x = unit(0, 'mm'),
axis.text.x = element_blank(),
axis.line = element_line(),
strip.placement = 'outside')
使用的数据
我使用了与OP的上一个问题中创建的相同数据,在这里再次复制:
set.seed(1)
filtered_lymph <- data.frame(
cluster = rep(c("CD4+ T Cells", "Effector T Cells", "NK Cells",
"Progenitor T Cells"), each = 30),
New_condition = rep(rep(c("NTX", "Anti-PD-1", "Anti-CTLA-4"), each = 10), 4),
`total cells` = rnorm(120, 3, 0.5)^2, check.names = FALSE)
filtered_lymph[1:2] <- lapply(filtered_lymph[1:2], \(x) factor(x, unique(x)))