R 通过字符串 r 中的索引连接数据帧

问题描述 投票:0回答:2

我正在尝试合并/连接两个数据框

df
df2

df
是根据在每个
position
(第 4 个、第 10 个、第 12 个……字符)处被切割成
string
的字符向量生成的。初始向量看起来像这样

vec1 <- paste(df$string, collapse = "")
.

df2
name
对应于
vec1
中的一些字符。例如,
vec1
中的第三个和第五个字符是
P
A
,它们的
names
分别是
pear
apple

df <- data.frame("position" = c(4, 10, 12, 20, 27, 30),
             "string" = c("MPPA", "APARLA", "LA", "LGLGLWLG", "ALAGGPG", "RGC"))
df2 <- data.frame("character" = c("P", "A", "L", "A", "P", "G"),
              "position" = c(3, 5, 9, 21, 26, 29),
              "name" = c("pear", "apple", "lemon", "apricot", "peach", "grape"))

我想将

df
df2
组合成
df3
以显示
string
中的哪个
df
name
,如下所示。有什么好的方法可以做到这一点?

df3 <- data.frame("position" = c(4, 10, 12, 20, 27, 30),
              "string" = c("MPPA", "APARLA", "LA", "LGLGLWLG", "ALAGGPG", "RGC"),
              "name" = c("pear", "apple, lemon", NA, NA, "apricot, peach", "grape"))
r dplyr merge
2个回答
0
投票
library(tidyverse)
df |>
  left_join(
    df |>
      separate_longer_position(string, 1) |>
      mutate(pos = row_number()) |>
      left_join(df2, join_by(pos == position, string == character)) |>
      filter(!is.na(name)) |>
      summarize(name = paste(name, collapse = ","), .by = position))

0
投票

一种选择是使用

sapply
substring
并相应计算开始/停止位置:

df$name <- sapply(1:nrow(df2), \(i) {
  pos <- df2$position - ifelse(i==1, 0, df$position[i-1])
  paste(df2$name[df2$character == substring(df$string[i], pos, pos)], collapse=", ")})

df
  position   string           name
1        4     MPPA           pear
2       10   APARLA   apple, lemon
3       12       LA               
4       20 LGLGLWLG               
5       27  ALAGGPG apricot, peach
6       30      RGC          grape
© www.soinside.com 2019 - 2024. All rights reserved.