根据日期和预定义值有条件地改变新列 - data.table

问题描述 投票:0回答:0

数据:

DT<-data.table::data.table(
          ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L),
       C_OPR = c("ABCD01", "ABCD11", NA, "EFGH", NA, NA, "KLMN", NA),
       D_OPR = c(NA, NA, "PQRST", NA, "EFGHIJ", NA, NA, NA),
        DATE = c("2007-07-07","2005-05-05","2002-02-02",
                 "2002-02-02","2004-04-04",NA,"2001-01-01",NA),
   INDX_DATE = c("2006-06-06","2006-06-06","2006-06-06",
                 "2001-01-01","2001-01-01","2001-01-01","2005-05-05",
                 "2005-05-05")
)

ALFA_DEF<-c("ABCD","EFGH")

输出:

   ID  C_OPR  D_OPR       DATE  INDX_DATE
1:  1 ABCD01   <NA> 2007-07-07 2006-06-06
2:  1 ABCD11   <NA> 2005-05-05 2006-06-06
3:  1   <NA>  PQRST 2002-02-02 2006-06-06
4:  2   EFGH   <NA> 2002-02-02 2001-01-01
5:  2   <NA> EFGHIJ 2004-04-04 2001-01-01
6:  2   <NA>   <NA>       <NA> 2001-01-01
7:  3   KLMN   <NA> 2001-01-01 2005-05-05
8:  3   <NA>   <NA>       <NA> 2005-05-05

期望的输出:

   ID  C_OPR  D_OPR       DATE  INDX_DATE ALFA
1:  1 ABCD01   <NA> 2007-07-07 2006-06-06    1
2:  1 ABCD11   <NA> 2005-05-05 2006-06-06    1
3:  1   <NA>  PQRST 2002-02-02 2006-06-06    1
4:  2   EFGH   <NA> 2002-02-02 2001-01-01    0
5:  2   <NA> EFGHIJ 2004-04-04 2001-01-01    0
6:  2   <NA>   <NA>       <NA> 2001-01-01    0
7:  3   KLMN   <NA> 2001-01-01 2005-05-05    0
8:  3   <NA>   <NA>       <NA> 2005-05-05    0

逻辑:

任何包含

C_OPR
D_OPR
ALFA_DEF
,其中
DATE
小于
INDX_DATE
- 在同一组(
ID
)中,生成
ALFA
= 1,否则为0.

没有日期修正的尝试:

DT[, ALPHA := +any( grepl(paste0(ALPHA_DEF, collapse="|"),c(D_OPR, C_OPR)), by=ID]

最好是 data.table 解决方案,但也欢迎 dplyr。

/高

r data.table
© www.soinside.com 2019 - 2024. All rights reserved.