在 R 中,我想将下划线之后的所有内容添加到前一行的字符串中,以便它们在所述下划线之后都有相同的字符串

问题描述 投票:0回答:2

假设我有一个专栏包含以下内容;

c(“Q9NZ”,“Q69Z_1036_1037_1_1_S1036”,“A3K8”,“P123_567_789_1_1_T567”)

我希望输出找到其下方行中的下划线,并将其加上字符串的其余部分复制到我正在使用的行。 期望的输出是;

c(“Q9NZ_1036_1037_1_1_S1036”,“Q69Z_1036_1037_1_1_S1036”,“A3K8_567_789_1_1_T567”,“P123_567_789_1_1_T567”)

r string character
2个回答
0
投票
x = c("Q9NZ", "Q69Z_1036_1037_1_1_S1036", "A3K8", "P123_567_789_1_1_T567")

## identify target items without underscores
target_rows = grep(pattern = "_", x, invert = TRUE)

## remove everything up to the first underscore in the following rows
next_row_after_underscore = sub(pattern = "[^_]*_", replacement = "_", x[target_rows + 1])

## paste together
x[target_rows] = paste0(x[target_rows], next_row_after_underscore)
x
# [1] "Q9NZ_1036_1037_1_1_S1036" "Q69Z_1036_1037_1_1_S1036" "A3K8_567_789_1_1_T567"   
# [4] "P123_567_789_1_1_T567"

0
投票

这里有一个 dplyr+tidyr 方法

data.frame(x) %>% 
  tidyr::separate_wider_delim(x, delim="_", names = c("prefix", "suffix"), too_many = "merge", too_few="align_start") %>% 
  tidyr::fill(suffix, .direction="up") %>% 
  transmute(value=paste(prefix, suffix, sep="_"))

返回

  value                   
  <chr>                   
1 Q9NZ_1036_1037_1_1_S1036
2 Q69Z_1036_1037_1_1_S1036
3 A3K8_567_789_1_1_T567   
4 P123_567_789_1_1_T567   

因此,如果您需要 data.frame 中的数据,这可能会有所帮助。

© www.soinside.com 2019 - 2024. All rights reserved.