我有两个DT
X = data.table(names = c("a", "a", "b", "b", "c", "c"), years = c("2001", "2002", "2001", "2002", "2001", "2002"), val.1 = 1:6, key = c("names", "years"))
X
| names | years | val.1 |
| -------- | -------- |
| a | 2001 | 1 |
| a | 2002 | 2 |
| b | 2001 | 3 |
| b | 2002 | 4 |
| c | 2001 | 5 |
| c | 2002 | 6 |
和
X.update = data.table(names = c("a", "b", "b", "c", "d", "d", "d"), years = c("2003", "2002", "2003", "2003", "2001", "2002", "2003"), val.1 = 11:17)
X.更新
| names | years | val.1 |
| -------- | -------- |
| a | 2003 | 11 |
| b | 2002 | 12 |
| b | 2003 | 13 |
| c | 2003 | 14 |
| d | 2001 | 15 |
| d | 2002 | 16 |
| d | 2003 | 17 |
这个任务对我来说看起来很自然。 X.update 取代相同 c("names", "year") 的所有旧值 (val.1) 并在其他地方添加新条目。
这里的意思是:
X.最终
| names | year | val.1 |
| -------- | -------- |
| a | 2001 | 1 |
| a | 2002 | 2 |
| a | 2003 | 11 | <-added for a for 2003: 11 instead of 3
| b | 2001 | 4 |
| b | 2002 | 12 | <-corrected for b for 2002
| b | 2003 | 13 | <-added for b for 2003
| c | 2001 | 5 |
| c | 2002 | 6 |
| c | 2003 | 14 | <-added for c for 2003
| d | 2001 | 15 | <-added
| d | 2002 | 16 | <-added
| d | 2003 | 17 | <-added
因为我需要这个用于具有 100,000 行的表,所以我想在 DT 中寻求一个惯用的(=快速)解决方案。
merge.data.table()
进行完整外部联接,并指定 all = TRUE
。然后 fcoalesce()
结果,即 通过按顺序连续从候选向量中提取来填充向量中的缺失值,首先从更新的表中选择值,否则使用原始值。
merge(
X.update,
X,
by = c("names", "years"),
all = TRUE,
suffix = c("_update", "_original")
)[, val.1 := fcoalesce(val.1_update, val.1_original)][,
`:=`(val.1_update = NULL, val.1_original = NULL)
][]
输出:
names years val.1
<char> <char> <int>
1: a 2001 1
2: a 2002 2
3: a 2003 11
4: b 2001 3
5: b 2002 12
6: b 2003 13
7: c 2001 5
8: c 2002 6
9: c 2003 14
10: d 2001 15
11: d 2002 16
12: d 2003 17
请注意,这与
b
中 2001
的预期输出不同,但由于原始值为 3
并且没有新值覆盖它,我认为这是预期输出中的错误。