假设我有一个专栏包含以下内容;
c(“Q9NZ”,“Q69Z_1036_1037_1_1_S1036”,“A3K8”,“P123_567_789_1_1_T567”)
我希望输出找到其下方行中的下划线,并将其加上字符串的其余部分复制到我正在使用的行。 期望的输出是;
c(“Q9NZ_1036_1037_1_1_S1036”,“Q69Z_1036_1037_1_1_S1036”,“A3K8_567_789_1_1_T567”,“P123_567_789_1_1_T567”)
x = c("Q9NZ", "Q69Z_1036_1037_1_1_S1036", "A3K8", "P123_567_789_1_1_T567")
## identify target items without underscores
target_rows = grep(pattern = "_", x, invert = TRUE)
## remove everything up to the first underscore in the following rows
next_row_after_underscore = sub(pattern = "[^_]*_", replacement = "_", x[target_rows + 1])
## paste together
x[target_rows] = paste0(x[target_rows], next_row_after_underscore)
x
# [1] "Q9NZ_1036_1037_1_1_S1036" "Q69Z_1036_1037_1_1_S1036" "A3K8_567_789_1_1_T567"
# [4] "P123_567_789_1_1_T567"
这里有一个 dplyr+tidyr 方法
data.frame(x) %>%
tidyr::separate_wider_delim(x, delim="_", names = c("prefix", "suffix"), too_many = "merge", too_few="align_start") %>%
tidyr::fill(suffix, .direction="up") %>%
transmute(value=paste(prefix, suffix, sep="_"))
返回
value
<chr>
1 Q9NZ_1036_1037_1_1_S1036
2 Q69Z_1036_1037_1_1_S1036
3 A3K8_567_789_1_1_T567
4 P123_567_789_1_1_T567
因此,如果您需要 data.frame 中的数据,这可能会有所帮助。