根据两列之间重叠/不重叠的值创建和分配组

问题描述 投票:0回答:1

我有一个数据框,其中包括最小列和最大列。例如,

df <- data.frame(min=c(2,4,3,3,2,6),max=c(2.9,5.9,3.9,4.9,7.9,7.9))

我有兴趣根据两列之间的重叠/非重叠创建和分配组。例如,如果两行的最小值和最大值没有重叠,它们将获得两个单独的字母代码。但是,如果另一行属于这两个组,则它会收到其他两个组的串联版本。对于上面的数据框,如下所示,

df$group <- c("a","b","c","bc","abcd","d")

本质上,这类似于使用字母代码来表示统计显着性,但这些是最大值和最小值。我有一个大数据框,希望自动化此过程,最终目标是将字母代码放置在 ggplot2 中的 geom_errorbars 上方。

r dataframe grouping variable-assignment clustering-key
1个回答
0
投票
# make an empty column for the overlaps
df$overlap <- ""

# create letter index
letter <- 1

while (any(df$overlap == "")) {
    # get the first row that has an NA overlap
    row <- which(df$overlap == "")[1]

    # if the ranges overlap, then add the letter to the overlap column, otherwise add nothing
    new_vals <- if_else(df$min < df$max[row] & df$max > df$min[row], letters[letter], "")
    
    # increment letter index
    letter = letter + 1

    # add the new values to the overlap column
    df$overlap <- paste0(df$overlap, new_vals)
}
© www.soinside.com 2019 - 2024. All rights reserved.