以下是样本数据
acct <- c("000012345","000122452","000122552","000122621","000122201","000122365")
run <- c("00000","00000","00000","00000","00000","00000")
ein <- c("11111","22222","33333","44444","55555","11111")
succuiacct <- c("","","","","","")
succrun <- c("","","","","","")
first <- data.frame(acct, run, ein, succuiacct,succrun)
acct2 <- c("000012345","00012346","000122452","000122453","000122365","000122777")
run2 <- c("00000","00000","00000","00000","00000","00000")
succuiacct2 <- c("000200100", "000122914","000200101","000122995","000200102","0001233222")
succrun <- c("00000","00000","00000","00000","00000","00000")
second <- data.frame(acct2, run2, succuiacct2, succrun)
目标是当first.acct + first.run和second.acct2 + secondary.run2值相同时,用匹配的sucuiacct2值填充第一个表。我知道在这个例子中所有的运行值都是 00000,但是在更大的数据集中,我需要基本上连接它们并使它们成为一个唯一的标识符。
最终结果将是这样的
acct run ein succuiacct succrun
000012345 00000 11111 000200100 00000
000122452 00000 22222 000200101 00000
000122552 00000 33333 "Blank" "Blank"
dplyr::inner_join(first, second, by = c("acct" = "acct2", "run" = "run2"))
输出:
acct run ein succuiacct succrun.x succuiacct2 succrun.y
1 000012345 00000 11111 000200100 00000
2 000122452 00000 22222 000200101 00000
3 000122365 00000 11111 000200102 00000
备注: