如何在 R 中的 data.table 中创建一个列来过滤原始数据表并根据条件返回一些值

问题描述 投票:0回答:1

我有一个相当大的 data.table,我想创建一个新列,根据当前行中定义的过滤器搜索同一 data.table 的不同行。

参见下面的示例:

library(data.table)
dt<-data.table(a=seq.Date(as.Date("2023-11-01"), as.Date("2023-11-30"), by="day"), 
               b=c(1:27, 46, 34, 101))
dt$a_m1<-c(seq.Date(as.Date("2023-11-16"), as.Date("2023-11-30"), by="day"), seq.Date(as.Date("2023-11-01"), as.Date("2023-11-15"), by="day"))

            a  b       a_m1
 1: 2023-11-01  1 2023-11-16
 2: 2023-11-02  2 2023-11-17
 3: 2023-11-03  3 2023-11-18
 4: 2023-11-04  4 2023-11-19
 5: 2023-11-05  5 2023-11-20
 6: 2023-11-06  6 2023-11-21
 7: 2023-11-07  7 2023-11-22
 8: 2023-11-08  8 2023-11-23
 9: 2023-11-09  9 2023-11-24
10: 2023-11-10 10 2023-11-25
11: 2023-11-11 11 2023-11-26
12: 2023-11-12 12 2023-11-27
13: 2023-11-13 13 2023-11-28
14: 2023-11-14 14 2023-11-29
15: 2023-11-15 15 2023-11-30
16: 2023-11-16 16 2023-11-01
17: 2023-11-17 17 2023-11-02
18: 2023-11-18 18 2023-11-03
19: 2023-11-19 19 2023-11-04
20: 2023-11-20 20 2023-11-05
21: 2023-11-21 21 2023-11-06
22: 2023-11-22 22 2023-11-07
23: 2023-11-23 23 2023-11-08
24: 2023-11-24 24 2023-11-09
25: 2023-11-25 25 2023-11-10
26: 2023-11-26 26 2023-11-11
27: 2023-11-27 27 2023-11-12
28: 2023-11-28 46 2023-11-13
29: 2023-11-29 34 2023-11-14
30: 2023-11-30 101 2023-11-15 

期望的输出是:

             a  b       a_m1 b_m1
 1: 2023-11-01  1 2023-11-16   16
 2: 2023-11-02  2 2023-11-17   17
 3: 2023-11-03  3 2023-11-18   18
 4: 2023-11-04  4 2023-11-19   19
 5: 2023-11-05  5 2023-11-20   20
 6: 2023-11-06  6 2023-11-21   21
 7: 2023-11-07  7 2023-11-22   22
 8: 2023-11-08  8 2023-11-23   23
 9: 2023-11-09  9 2023-11-24   24
10: 2023-11-10 10 2023-11-25   25
11: 2023-11-11 11 2023-11-26   26
12: 2023-11-12 12 2023-11-27   27
13: 2023-11-13 13 2023-11-28   46
14: 2023-11-14 14 2023-11-29   34
15: 2023-11-15 15 2023-11-30   101
16: 2023-11-16 16 2023-11-01    1
17: 2023-11-17 17 2023-11-02    2
18: 2023-11-18 18 2023-11-03    3
19: 2023-11-19 19 2023-11-04    4
20: 2023-11-20 20 2023-11-05    5
21: 2023-11-21 21 2023-11-06    6
22: 2023-11-22 22 2023-11-07    7
23: 2023-11-23 23 2023-11-08    8
24: 2023-11-24 24 2023-11-09    9
25: 2023-11-25 25 2023-11-10   10
26: 2023-11-26 26 2023-11-11   11
27: 2023-11-27 27 2023-11-12   12
28: 2023-11-28 46 2023-11-13   13
29: 2023-11-29 34 2023-11-14   14
30: 2023-11-30 101 2023-11-15   15

非常感谢您的帮助。 我尝试过如下代码:

dt[,b_m1:=dt[a_m1==dt$a]$b,]

但是没有效果。

r data.table
1个回答
1
投票

我想你想要一个自加入作业。

dt[dt, b_m1 := i.b, on = .(a == a_m1)]
#              a     b       a_m1  b_m1
#         <Date> <num>     <Date> <num>
#  1: 2023-11-01     1 2023-11-16    16
#  2: 2023-11-02     2 2023-11-17    17
#  3: 2023-11-03     3 2023-11-18    18
#  4: 2023-11-04     4 2023-11-19    19
#  5: 2023-11-05     5 2023-11-20    20
#  6: 2023-11-06     6 2023-11-21    21
#  7: 2023-11-07     7 2023-11-22    22
#  8: 2023-11-08     8 2023-11-23    23
#  9: 2023-11-09     9 2023-11-24    24
# 10: 2023-11-10    10 2023-11-25    25
# ---                                  
# 21: 2023-11-21    21 2023-11-06     6
# 22: 2023-11-22    22 2023-11-07     7
# 23: 2023-11-23    23 2023-11-08     8
# 24: 2023-11-24    24 2023-11-09     9
# 25: 2023-11-25    25 2023-11-10    10
# 26: 2023-11-26    26 2023-11-11    11
# 27: 2023-11-27    27 2023-11-12    12
# 28: 2023-11-28    46 2023-11-13    13
# 29: 2023-11-29    34 2023-11-14    14
# 30: 2023-11-30   101 2023-11-15    15

all.equal(dt, desired)
# [1] TRUE

一般来说,这是一个合并/连接,StackOverflow 上一个很好的

data.table
特定问答是 使用 data.table 进行左连接。有关更一般术语的概念,请参阅


数据:

# calculated from the question
dt <- data.table::as.data.table(structure(list(a = structure(c(19662, 19663, 19664, 19665, 19666, 19667, 19668, 19669, 19670, 19671, 19672, 19673, 19674, 19675, 19676, 19677, 19678, 19679, 19680, 19681, 19682, 19683, 19684, 19685, 19686, 19687, 19688, 19689, 19690, 19691), class = "Date"), b = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 46, 34, 101), a_m1 = structure(c(19677, 19678, 19679, 19680, 19681, 19682, 19683, 19684, 19685, 19686, 19687, 19688, 19689, 19690, 19691,  19662, 19663, 19664, 19665, 19666, 19667, 19668, 19669, 19670, 19671, 19672, 19673, 19674, 19675, 19676), class = "Date"), b_m1 = c(16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 46, 34, 101, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)), row.names = c(NA, -30L), class = c("data.table", "data.frame")))
# copied and Date-ified from the question text
desired <- data.table::as.data.table(structure(list(a = structure(c(19662, 19663, 19664, 19665, 19666, 19667, 19668, 19669, 19670, 19671, 19672, 19673, 19674, 19675, 19676, 19677, 19678, 19679, 19680, 19681, 19682, 19683, 19684, 19685, 19686, 19687, 19688, 19689, 19690, 19691), class = "Date"), b = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 46L, 34L, 101L), a_m1 = structure(c(19677, 19678, 19679, 19680, 19681, 19682, 19683, 19684, 19685, 19686, 19687,  19688, 19689, 19690, 19691, 19662, 19663, 19664, 19665, 19666, 19667, 19668, 19669, 19670, 19671, 19672, 19673, 19674, 19675, 19676), class = "Date"), b_m1 = c(16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 46L, 34L, 101L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L)), row.names = c(NA, -30L), class = c("data.table", "data.frame")))
© www.soinside.com 2019 - 2024. All rights reserved.