我有一个数据框,其中包括最小列和最大列。例如,
df <- data.frame(min=c(2,4,3,3,2,6),max=c(2.9,5.9,3.9,4.9,7.9,7.9))
我有兴趣根据两列之间的重叠/非重叠创建和分配组。例如,如果两行的最小值和最大值没有重叠,它们将获得两个单独的字母代码。但是,如果另一行属于这两个组,则它会收到其他两个组的串联版本。对于上面的数据框,如下所示,
df$group <- c("a","b","c","bc","abcd","d")
本质上,这类似于使用字母代码来表示统计显着性,但这些是最大值和最小值。我有一个大数据框,希望自动化此过程,最终目标是将字母代码放置在 ggplot2 中的 geom_errorbars 上方。
# make an empty column for the overlaps
df$overlap <- ""
# create letter index
letter <- 1
while (any(df$overlap == "")) {
# get the first row that has an NA overlap
row <- which(df$overlap == "")[1]
# if the ranges overlap, then add the letter to the overlap column, otherwise add nothing
new_vals <- if_else(df$min < df$max[row] & df$max > df$min[row], letters[letter], "")
# increment letter index
letter = letter + 1
# add the new values to the overlap column
df$overlap <- paste0(df$overlap, new_vals)
}