我有一个数据集,例如:
ID DATETIME CODE Value
999 1/2/2024 16:22 TX 100
123 1/2/2024 16:47 IP 100
666 1/2/2024 17:13 IP 85
666 1/2/2024 17:38 IP 100
123 1/2/2024 18:03 TX 90
666 1/2/2024 18:28 TX 85
666 1/2/2024 18:54 IP 100
123 1/2/2024 19:19 CA 100
666 1/2/2024 19:44 OX 95
999 1/2/2024 20:09 18 75
123 1/2/2024 20:35 12 100
654 1/2/2024 21:00 IP 85
这是上面的代表:
structure(list(ID = c("999", "123", "666", "666", "123", "666",
"666", "123", "666", "999", "123", "654"), DATETIME = structure(c(1706804520,
1706806020, 1706807580, 1706809080, 1706810580, 1706812080, 1706813640,
1706815140, 1706816640, 1706818140, 1706819700, 1706821200), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), CODE = c("TX", "IP", "IP", "IP", "TX",
"TX", "IP", "CA", "OX", "18", "12", "IP"), Value = c(100, 100,
85, 100, 90, 85, 100, 100, 95, 75, 100, 85)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -12L))
我想根据条件(CODE == IP)添加过去 2 小时内按 ID 列出的累计总和列。像这样:
ID DATETIME CODE Value cum_IP
999 1/2/2024 16:22 TX 100 0
123 1/2/2024 16:47 IP 100 100
666 1/2/2024 17:13 IP 85 85
666 1/2/2024 17:38 IP 100 185
123 1/2/2024 18:03 TX 90 0
666 1/2/2024 18:28 TX 85 0
666 1/2/2024 18:54 IP 100 285
123 1/2/2024 19:19 CA 100 0
666 1/2/2024 19:44 OX 95 0
999 1/2/2024 20:09 18 75 0
123 1/2/2024 20:35 12 100 0
654 1/2/2024 21:00 IP 85 85
我希望手动计算列时不会犯任何错误,但这个想法是可以理解的。一个函数,使用分组 var (ID)、谓词(在本例中为 CODE == IP,但它将是 v.g is.number(CODE))和 a 对行计算聚合运算(求和或简单计数)窗口(距当前行 2 小时)。
这是一个sql左自连接:
library(sqldf)
sqldf("select a.*, (a.CODE = 'IP') * sum(b.Value * (b.CODE == 'IP')) cum_IP
from dat a
left join dat b on a.ID = b.ID and
b.DATETIME between a.DATETIME - 2 * 60 * 60 and a.DATETIME
group by a.rowid")
给予
ID DATETIME CODE Value cum_IP
1 999 2024-02-01 11:22:00 TX 100 0
2 123 2024-02-01 11:47:00 IP 100 100
3 666 2024-02-01 12:13:00 IP 85 85
4 666 2024-02-01 12:38:00 IP 100 185
5 123 2024-02-01 13:03:00 TX 90 0
6 666 2024-02-01 13:28:00 TX 85 0
7 666 2024-02-01 13:54:00 IP 100 285
8 123 2024-02-01 14:19:00 CA 100 0
9 666 2024-02-01 14:44:00 OX 95 0
10 999 2024-02-01 15:09:00 18 75 0
11 123 2024-02-01 15:35:00 12 100 0
12 654 2024-02-01 16:00:00 IP 85 85