如何执行 t 检验并绘制 p 值以在分组箱线图 (ggplot) 上进行组间比较?

问题描述 投票:0回答:1

我有一个数据框,其结构和摘要如下所示:

> str(filtered_lymph)
tibble [108 × 5] (S3: tbl_df/tbl/data.frame)
 $ cluster          : chr [1:108] "CD4+ Tcells" "CD4+ Tcells" "CD4+ Tcells" "CD4+ Tcells" ...
 $ condition        : chr [1:108] "B16_NTX_1" "B16_NTX_2" "B16_NTX_3" "B16_NTX_4" ...
 $ total cells      : num [1:108] 1.706 1.415 2.01 0.858 1.264 ...
 $ New_Condition    : Factor w/ 3 levels "NTX","Anti-PD-1",..: 1 1 1 1 1 1 1 1 1 2 ...
 $ Tissue_Expression: chr [1:108] "Lymphoid" "Lymphoid" "Lymphoid" "Lymphoid" ...

> summary(filtered_lymph)
   cluster           condition          total cells     
 Length:108         Length:108         Min.   : 0.3861  
 Class :character   Class :character   1st Qu.: 2.6321  
 Mode  :character   Mode  :character   Median : 4.5800  
                                       Mean   : 6.0848  
                                       3rd Qu.: 7.2591  
                                       Max.   :23.2172  
   New_Condition Tissue_Expression 
 NTX      :36    Length:108        
 Anti-PD-1:32    Class :character  
 AC484    :40    Mode  :character 

我正在尝试找到一种在控制和测试“New_Condition”变量之间进行双尾不配对 t 检验的方法 - 这将在“NTX”和 Anti-PD-1”之间进行,另一个在“NTX”和“AC484”之间进行'。我希望针对每个不同的细胞簇进行计算,即。 “CD4+ T 细胞”、“祖 T 细胞”等

然后我希望能够在 ggplot 上的相应箱线图对上方绘制 p 值 - 其代码如下所示:

p_lymph <- ggplot(filtered_lymph, aes(x = cluster, y = `total cells`, fill = New_Condition, color = New_Condition)) +
  geom_boxplot(position = position_dodge(width = 0.8), size = 0.9,linetype="solid", outlier.shape=NA) +
  scale_fill_manual(values=fill_colors) + #These two lines I am changing the legend label from New_Condition (as it is named in the original data column) to Treatment as stated on the original paper
  scale_color_manual(name="Condition", values = outline_colors) +
  theme_minimal()+
  guides(fill="none",color="none")+ #removing the legend
  labs(y="Total cells (%)",
       x=NULL)+
  scale_x_discrete(labels=x_lym_labels)+
  theme(
    axis.line=element_line(color="black",linewidth=0.5,linetype="solid"), #adding axis lines
    axis.text.x=element_blank(),
    axis.ticks.x=element_line(colour="black",linewidth=0.5),#manually adding in tick breaks
    axis.ticks.y=element_line(colour="black", linewidth=0.5),
    panel.grid=element_blank() #removing grid lines
  )+
      scale_y_continuous(breaks=seq(0,25, by=5),
                         limits=c(0,25), #Adjusting the y-axis breaks
                         labels=c("0","5","10","15","20","25") #customising the y-axis labels
                        )

我之前尝试过t.test功能:

t.test(filtered_lymph$'total cells',filtered_lymph$cluster[filtered_lymph$New_Condition])$p.value

但是出现了这个错误

Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In mean.default(y) : argument is not numeric or logical: returning NA
2: In var(y) : NAs introduced by coercion

任何帮助将不胜感激!

r ggplot2 rstudio p-value t-test
1个回答
0
投票

答案,正如评论中所暗示的那样,是使用类似

geom_signif
中的
ggsignif
之类的东西,但你必须改变情节的结构才能使其发挥作用:

library(ggplot2)
library(ggsignif)

ggplot(filtered_lymph, aes(New_condition, `total cells`,
       color = New_condition, fill = after_scale(alpha(color, 0.5)))) +
  geom_boxplot() +
  geom_signif(comparisons = list(c('NTX', 'Anti-PD-1'),
                                 c('NTX', 'Anti-CTLA-4')),
              step_increase = 0.1, color = 'black',
              map_signif_level = function(p) sprintf("p = %.2g", p)) +
  scale_x_discrete('Cluster', expand = c(0.25, 0.5)) +
  scale_color_manual("Condition", values = c("#4d4b4c", "#4c517b", "#c63a41")) +
  facet_grid(.~cluster, switch = 'x') +
  theme_minimal() +
  theme(panel.spacing.x = unit(0, 'mm'),
        axis.text.x = element_blank(),
        axis.line = element_line(),
        strip.placement = 'outside') 


使用的数据

我使用了与OP的上一个问题中创建的相同数据,在这里再次复制:

set.seed(1)

filtered_lymph <- data.frame(
  cluster = rep(c("CD4+ T Cells", "Effector T Cells", "NK Cells",
                  "Progenitor T Cells"), each = 30),
  New_condition = rep(rep(c("NTX", "Anti-PD-1", "Anti-CTLA-4"), each = 10), 4),
  `total cells` = rnorm(120, 3, 0.5)^2, check.names = FALSE)

filtered_lymph[1:2] <- lapply(filtered_lymph[1:2], \(x) factor(x, unique(x)))
© www.soinside.com 2019 - 2024. All rights reserved.