我不敢相信我很难找到解决这个问题的方法:我有两个数据表,它们的行和列看起来像这样:
Country <- c("FRA", "FRA", "DEU", "DEU", "CHE", "CHE")
Year <- c(2010, 2020, 2010, 2020, 2010, 2020)
acctm <- c(20, 30, 10, NA, 20, NA)
acctf <- c(20, NA, 15, NA, 40, NA)
dt1 <- data.table(Country, Year, acctm, acctf)
Country Year acctm acctf
1 FRA 2010 20 20
2 FRA 2020 30 NA
3 DEU 2010 10 15
4 DEU 2020 NA NA
5 CHE 2010 20 40
6 CHE 2020 NA NA
Country <- c("FRA", "FRA", "DEU", "DEU", "CHE", "CHE")
Year <- c(2010, 2020, 2010, 2020, 2010, 2020)
acctm <- c(1, 1, 1, 60, 1, 70)
acctf <- c(1, 60, 1, 80, 1, 100)
dt2 <- data.table(Country, Year, acctm, acctf)
Country Year acctm acctf
1 FRA 2010 1 1
2 FRA 2020 2 60
3 DEU 2010 1 1
4 DEU 2020 60 80
5 CHE 2010 1 2
6 CHE 2020 70 100
我需要创建一个新的数据表,用NA
中对应的国家/地区/变量匹配的值替换dt1
中的dt2
值,从而产生一个如下所示的表:
Country Year acctm acctf
1 FRA 2010 20 20
2 FRA 2020 30 60
3 DEU 2010 10 15
4 DEU 2020 60 80
5 CHE 2010 20 40
6 CHE 2020 70 100
我们可以通过在[国家/地区,年份]列中加入on
来完成此操作
library(data.table)
nm1 <- names(dt1)[3:4]
nm2 <- paste0("i.", nm1)
dt3 <- copy(dt1)
dt3[dt2, (nm1) := Map(function(x, y)
fifelse(is.na(x), y, x), mget(nm1), mget(nm2)), on = .(Country, Year)]
dt3
# Country Year acctm acctf
#1: FRA 2010 20 20
#2: FRA 2020 30 60
#3: DEU 2010 10 15
#4: DEU 2020 60 80
#5: CHE 2010 20 40
#6: CHE 2020 70 100
或者为了使其紧凑,请使用fcoalesce
中的data.table
(来自@IceCreamToucan的评论)
dt3[dt2, (nm1) := Map(fcoalesce, mget(nm1), mget(nm2)), on = .(Country, Year)]
如果数据集的维度相同,并且“国家/地区”,“年份”具有相同的值,那么另一个选择是
library(purrr)
library(dplyr)
list(dt1[, .(acctm, acctf)], dt2[, .(acctm, acctf)]) %>%
reduce(coalesce) %>%
bind_cols(dt1[, .(Country, Year)], .)
如果订购的方式完全相同,则可以这样做
as.data.table(Map(function(x, y) ifelse(is.na(x), y, x), dt1, dt2))
# Country Year acctm acctf
# 1: FRA 2010 20 20
# 2: FRA 2020 30 60
# 3: DEU 2010 10 15
# 4: DEU 2020 60 80
# 5: CHE 2010 20 40
# 6: CHE 2020 70 100