我有以下 Rscript。
# input here is a data frame, and note it has columns col1, col2, id, date
output <- input[,j=list(
col1 = sum(col1),
col2 = sum(col2),
by=date]
我不知道任何 R,但根据我的理解,这段代码在每列上累积并按
date
列分组。
我想修改
col2
的计算,以不包括 id
中具有 do_not_include_ids
的行。
所以我尝试执行以下操作:
rows_filtered <- !(input$id %in% do_not_include_ids)
output <- input[,j=list(
col1 = sum(col1),
col2 = sum(col2[rows_filtered]),
by=date]
但是,这似乎只是对所有
col2[rows_filtered]
求和,然后将此值分配给所有日期,因此分组似乎在这里不起作用。
library(data.table)
input <- structure(list(col1 = c(13L, 16L, 1L, 4L, 4L, 8L), col2 = c(10L,
20L, 22L, 19L, 5L, 24L), id = c(1L, 1L, 5L, 3L, 2L, 1L), date = c("2021-01-01",
"2021-01-02", "2021-01-03", "2021-01-04", "2021-01-05", "2021-01-06"
)), class = "data.frame", row.names = c("0", "1", "2", "3", "4",
"5"))
setDT(input)
do_not_include_ids <- c(1, 5)
output <- input[, .(
col1_sum = sum(col1),
col2_sum = sum(col2[!id %in% do_not_include_ids])
), by = date]
date col1_sum col2_sum
1: 2021-01-01 13 0
2: 2021-01-02 16 0
3: 2021-01-03 1 0
4: 2021-01-04 4 19
5: 2021-01-05 4 5
6: 2021-01-06 8 0