我想制作一个 ggplot 箱线图,我想我需要从此格式化我的数据:
markerID V1 V2 V3
1 0.8636364 0.8409091 0.7954545
2 0.8863636 0.8409091 0.8409091
对此:
markerID replicate rate
1 1 0.8636364
1 2 0.8409091
1 3 0.7954545
2 1 0.8863636
2 2 0.8409091
2 3 0.8409091
为了便于阅读,我只显示了部分数据。
我想一旦有了这种格式,我就可以按标记 ID 进行分组,然后制作箱线图。列数和行数可能会有所不同,因此我不确定如何应用诸如melt()或pivot_longer()之类的函数。
示例数据:
structure(list(markerID = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10", "11", "12", "13", "14", "15"), V1 = c(0.863636363636364,
0.886363636363636, 0.886363636363636, 0.795454545454545, 0.795454545454545,
0.863636363636364, 0.931818181818182, 0.909090909090909, 0.840909090909091,
0.863636363636364, 0.886363636363636, 0.795454545454545, 0.818181818181818,
0.863636363636364, 0.886363636363636), V2 = c(0.840909090909091,
0.840909090909091, 0.909090909090909, 0.772727272727273, 0.772727272727273,
0.909090909090909, 0.886363636363636, 0.886363636363636, 0.954545454545455,
0.75, 0.818181818181818, 0.772727272727273, 0.681818181818182,
0.863636363636364, 0.840909090909091), V3 = c(0.795454545454545,
0.840909090909091, 0.886363636363636, 0.818181818181818, 0.818181818181818,
0.795454545454545, 0.818181818181818, 0.863636363636364, 0.818181818181818,
0.818181818181818, 0.931818181818182, 0.772727272727273, 0.772727272727273,
0.886363636363636, 0.886363636363636)), class = "data.frame", row.names = c(NA,
-15L))
使用
pivot_longer
的方法
library(tidyr)
pivot_longer(df, -markerID, names_prefix="V", names_to="replicate", values_to="rate")
# A tibble: 45 × 3
markerID replicate rate
<chr> <chr> <dbl>
1 1 1 0.864
2 1 2 0.841
3 1 3 0.795
4 2 1 0.886
5 2 2 0.841
6 2 3 0.841
7 3 1 0.886
8 3 2 0.909
9 3 3 0.886
10 4 1 0.795