如何在 R 中最好地执行这个内连接?

问题描述 投票:0回答:1

以下是样本数据

  acct <- c("000012345","000122452","000122552","000122621","000122201","000122365")
  run <- c("00000","00000","00000","00000","00000","00000")
  ein <- c("11111","22222","33333","44444","55555","11111")
  succuiacct <- c("","","","","","")
  succrun <- c("","","","","","")

 first <- data.frame(acct, run, ein, succuiacct,succrun)

 acct2 <- c("000012345","00012346","000122452","000122453","000122365","000122777")
 run2 <- c("00000","00000","00000","00000","00000","00000")
 succuiacct2 <- c("000200100", "000122914","000200101","000122995","000200102","0001233222")
 succrun <- c("00000","00000","00000","00000","00000","00000")


 second <- data.frame(acct2, run2, succuiacct2, succrun)

目标是当first.acct + first.run和second.acct2 + secondary.run2值相同时,用匹配的sucuiacct2值填充第一个表。我知道在这个例子中所有的运行值都是 00000,但是在更大的数据集中,我需要基本上连接它们并使它们成为一个唯一的标识符。

最终结果将是这样的

   acct          run          ein         succuiacct          succrun
000012345      00000        11111       000200100              00000
000122452      00000        22222       000200101              00000
000122552      00000        33333       "Blank"                "Blank"
r dplyr left-join inner-join
1个回答
0
投票
dplyr::inner_join(first, second, by = c("acct" = "acct2", "run" = "run2"))

输出:

       acct   run   ein succuiacct succrun.x succuiacct2 succrun.y
1 000012345 00000 11111                        000200100     00000
2 000122452 00000 22222                        000200101     00000
3 000122365 00000 11111                        000200102     00000

备注:

  1. 不要对空值使用“Blank”,使用 NA
  2. 无论如何,空白首先是从哪里来的,因为实际数据在这些行中没有空白(参见上面的输出)?
© www.soinside.com 2019 - 2024. All rights reserved.