这是一个 data.table,其中每个 id 都有一些参数,按一些定期的季度日期排列。 它最初是随机打乱的,但是,首先,比方说,它是按 fab_date 和 id 排序的。
set.seed(1)
dt_to_fun <- data.table(fab_date = structure(c(18993, 19174, 19358, 19539,
18993, 19174, 19358, 19539, 18993, 19174, 19358, 19539, 18993,
19174, 19358, 19539, 18993, 19174, 19358, 19539), class = "Date"),
id = c("n_01", "n_01", "n_01", "n_01", "n_02", "n_02", "n_02",
"n_02", "n_03", "n_03", "n_03", "n_03", "n_04", "n_04", "n_04",
"n_04", "n_05", "n_05", "n_05", "n_05"),
param_01 = sample(c(10,20, 30, 40, 50), 20, replace = TRUE),
param_02 = sample(c(10,20, 30, 40, 50), 20, replace = TRUE),
param_03 = sample(c(10,20, 30, 40, 50), 20, replace = TRUE))
dt_to_fun
fab_date id param_01 param_02 param_03
1: 2022-01-01 n_01 10 50 30
2: 2022-07-01 n_01 40 20 20
3: 2023-01-01 n_01 10 20 20
4: 2023-07-01 n_01 20 10 50
5: 2022-01-01 n_02 50 40 20
6: 2022-07-01 n_02 30 10 10
7: 2023-01-01 n_02 20 40 30
8: 2023-07-01 n_02 30 30 30
9: 2022-01-01 n_03 30 20 40
10: 2022-07-01 n_03 10 20 30
11: 2023-01-01 n_03 50 40 10
12: 2023-07-01 n_03 50 40 40
13: 2022-01-01 n_04 20 40 50
14: 2022-07-01 n_04 20 20 10
15: 2023-01-01 n_04 10 40 10
16: 2023-07-01 n_04 50 10 40
17: 2022-01-01 n_05 50 10 50
18: 2022-07-01 n_05 10 40 50
19: 2023-01-01 n_05 10 10 40
20: 2023-07-01 n_05 50 20 50
>
目标是:对于每个 id 替换(最后日期)参数值,如下所示: param_01(最后日期) = param_01(最后日期) + param_01(-1 周期) - param_01(-2 周期) 例如,对于 id n_05,param_02 的最后一个值为 20,前一个值为 10,前两次值为 40, 所以结果应该是 20 + 10 - 40 = -10。 所有 id 和所有 param 列都相同。最后日期是“2023-07-01”,因此仅应更新具有该日期的行中的参数。
我设法完成了这个计算,但我的方式适用于所有日期,但它需要 仅针对每个 ID 的最后日期执行此操作。 这是函数:
quarterly_process_fun <- function(dt) {
param_cols <- c("param_01", "param_02", "param_03")
dt[, (param_cols) := lapply(
.SD,
\(x) (x + data.table::shift(x, n = 1L, fill = x[1L], type = "lag") - data.table::shift(x, n = 2L, fill = x[1L], type = "lag")
)),
by = .(id),
.SDcols = param_cols
]
return(dt)
}
quarterly_process_fun(dt_to_fun)
结果:
dt_to_fun
fab_date id param_01 param_02 param_03
1: 2022-01-01 n_01 10 50 30
2: 2022-07-01 n_01 40 20 20
3: 2023-01-01 n_01 40 -10 10
4: 2023-07-01 n_01 -10 10 50
5: 2022-01-01 n_02 50 40 20
6: 2022-07-01 n_02 30 10 10
7: 2023-01-01 n_02 0 10 20
8: 2023-07-01 n_02 20 60 50
9: 2022-01-01 n_03 30 20 40
10: 2022-07-01 n_03 10 20 30
11: 2023-01-01 n_03 30 40 0
12: 2023-07-01 n_03 90 60 20
13: 2022-01-01 n_04 20 40 50
14: 2022-07-01 n_04 20 20 10
15: 2023-01-01 n_04 10 20 -30
16: 2023-07-01 n_04 40 30 40
17: 2022-01-01 n_05 50 10 50
18: 2022-07-01 n_05 10 40 50
19: 2023-01-01 n_05 -30 40 40
20: 2023-07-01 n_05 50 -10 40
那么,我该如何调整它,以便它只计算和替换每个 id 最后日期的差异?
另一个问题是,这是否可以在打乱的数据上进行管理?
set.seed(1)
dt_to_fun <- data.table(fab_date = structure(c(18993, 19174, 19358, 19539,
18993, 19174, 19358, 19539, 18993, 19174, 19358, 19539, 18993,
19174, 19358, 19539, 18993, 19174, 19358, 19539), class = "Date"),
id = c("n_01", "n_01", "n_01", "n_01", "n_02", "n_02", "n_02",
"n_02", "n_03", "n_03", "n_03", "n_03", "n_04", "n_04", "n_04",
"n_04", "n_05", "n_05", "n_05", "n_05"),
param_01 = sample(c(10,20, 30, 40, 50), 20, replace = TRUE),
param_02 = sample(c(10,20, 30, 40, 50), 20, replace = TRUE),
param_03 = sample(c(10,20, 30, 40, 50), 20, replace = TRUE))
#shuffle rows
dt_to_fun <- dt_to_fun[sample(nrow(dt_to_fun)),]
fab_date id param_01 param_02 param_03
1: 2023-07-01 n_03 50 40 40
2: 2023-07-01 n_04 50 10 40
3: 2022-01-01 n_01 10 50 30
4: 2022-01-01 n_04 20 40 50
5: 2022-01-01 n_02 50 40 20
6: 2023-01-01 n_04 10 40 10
7: 2022-07-01 n_02 30 10 10
8: 2022-07-01 n_05 10 40 50
9: 2022-01-01 n_03 30 20 40
10: 2023-01-01 n_02 20 40 30
11: 2023-01-01 n_03 50 40 10
12: 2023-01-01 n_01 10 20 20
13: 2022-07-01 n_04 20 20 10
14: 2022-07-01 n_01 40 20 20
15: 2022-07-01 n_03 10 20 30
16: 2023-07-01 n_05 50 20 50
17: 2023-07-01 n_01 20 10 50
18: 2023-07-01 n_02 30 30 30
19: 2023-01-01 n_05 10 10 40
20: 2022-01-01 n_05 50 10 50
我们可以稍微调整您的功能: 首先我们
sort
by id
和 fab_date
,然后
我们使用 .I
中的 data.table
(行索引)功能来识别每个 id
的最后日期。
library(data.table)
quarterly_process_fun <- function(dt) {
# Sort by id and fab_date
setorder(dt, id, fab_date)
# last row for each id
last_row <- dt[, .I[fab_date == max(fab_date)], by = id]$V1
dt[last_row, c("param_01", "param_02", "param_03") := lapply(
.SD,
function(x) (x + shift(x, n = 1L, fill = x[1L], type = "lag") - shift(x, n = 2L, fill = x[1L], type = "lag"))
), .SDcols = c("param_01", "param_02", "param_03")]
return(dt)
}
quarterly_process_fun(dt_to_fun)
dt_to_fun
fab_date id param_01 param_02 param_03
1: 2022-01-01 n_01 10 50 30
2: 2022-07-01 n_01 40 20 20
3: 2023-01-01 n_01 10 20 20
4: 2023-07-01 n_01 20 10 50
5: 2022-01-01 n_02 50 40 20
6: 2022-07-01 n_02 30 10 10
7: 2023-01-01 n_02 20 40 30
8: 2023-07-01 n_02 30 30 30
9: 2022-01-01 n_03 30 20 40
10: 2022-07-01 n_03 10 20 30
11: 2023-01-01 n_03 50 40 10
12: 2023-07-01 n_03 60 60 20
13: 2022-01-01 n_04 20 40 50
14: 2022-07-01 n_04 20 20 10
15: 2023-01-01 n_04 10 40 10
16: 2023-07-01 n_04 70 20 50
17: 2022-01-01 n_05 50 10 50
18: 2022-07-01 n_05 10 40 50
19: 2023-01-01 n_05 10 10 40
20: 2023-07-01 n_05 50 -10 50