使用带有缺失值的计数/比例对多层次分组进行显着性检验

问题描述 投票:0回答:0

使用看起来像的示例数据集

data.frame(
  Treatment = c("A", "A", "A", "A", "A", "A",
                "A", "A", "A", "A", "A", "A",
                "B", "B", "B", "B", "B", "B",
                "B", "B", "B", "B", "B", "B"),
  Patient = c(1, 1, 1, 1, 1, 1,
              2, 2, 2, 2, 2, 2,
              3, 3, 3, 3, 3, 3,
              4, 4, 4, 4, 4, 4),
  Timepoint = c("PRE", "PRE", "PRE", "POST", "POST", "POST",
                "PRE", "PRE", "PRE", "POST", "POST", "POST",
                "PRE", "PRE", "PRE", "POST", "POST", "POST",
                "PRE", "PRE", "PRE", "POST", "POST", "POST"),
  Phenotype = c("NK", "T Cell", "Macrophage", "NK", "T Cell", "Macrophage",
                "NK", "T Cell", "Macrophage", "NK", "T Cell", "Macrophage",
                "NK", "T Cell", "Macrophage", "NK", "T Cell", "Macrophage",
                "NK", "T Cell", "Macrophage", "NK", "T Cell", "Macrophage"),
  Count = c(523,235,2352,352,646,234,
            3463,525,646,234,725,264,
            1636,3153,455,134,646,253,
            464,252,464,276,364,353)
)

我试图进行两个级别的比较:

第一个将在每个表型的 PRE 和 POST 时间点之间,输出类似于:

data.frame(
  Patient = c(1, 1, 1,
              2, 2, 2,
              3, 3, 3,
              4, 4, 4),
  Phenotype = c("NK", "T Cell", "Macrophage",
                "NK", "T Cell", "Macrophage",
                "NK", "T Cell", "Macrophage",
                "NK", "T Cell", "Macrophage"),
  Pvalue = c(0, 0, 0,
             0, 0, 0,
             0, 0, 0,
             0, 0, 0)
)

第二个是更高层次的比较,使用治疗分组产生类似的东西:

data.frame(
  Treatment = c("A", "A", "A",
              "B", "B", "B"),
  Phenotype = c("NK", "T Cell", "Macrophage",
                "NK", "T Cell", "Macrophage"),
  Pvalue = c(0, 0, 0,
             0, 0, 0)
)

因为我在这里比较比例/计数,我假设我会做比例检验或卡方检验?我已经取消了成对测试,因为观察的数量不一致我仍然不确定哪个更适合这种比较。

我用 dplyr 试过这个:

df %>% group_by(Phenotype, Timepoint, Patient) %>% 
  summarise(pvalue = chisq.test(n)$p.value) 

但它失败了,因为在实际数据集中,有些情况下,某些患者的表型会出现在一个时间点,而另一个时间点不会出现。在某些情况下表型为 0 或 NA 的情况下,批量运行这些类型的测试的最佳方法是什么?实际数据集比我提供的虚拟集大得多,因此手动运行它们并不是最有效的选择。

感谢任何意见!

r dplyr group-by statistics chi-squared
© www.soinside.com 2019 - 2024. All rights reserved.