数据:
set.seed(42)
df1 = data.frame(
Date = seq.Date(as.Date("2018-01-01"),as.Date("2018-01-30"),1),
value = sample(1:30),
Y = sample(c("yes", "no"), 30, replace = TRUE)
)
df2 = data.frame(
Date = seq.Date(as.Date("2018-01-01"),as.Date("2018-01-30"),7)
)
总之,如果数据在范围内,则可行(from my previous question):
library(data.table)
df1$start <- df1$Date
df1$end <- df1$Date
df2$start <- df2$Date
df2$end <- df2$Date + 6
setDT(df1, key = c("start", "end"))
setDT(df2, key = c("start", "end"))
d = foverlaps(df1, df2)[, list(mySum = sum(value)), by = Date ]
我该怎么做countif?
因为当我尝试
d = foverlaps(df1, df2)[, list(mySum = count(value)), by = Date ]
我收到错误
没有适用于“group”应用于类“c”('double','numeric')的对象的方法
d = foverlaps(df1, df2)[, .N, by = Date]
我们可以使用.N
:
foverlaps(df1, df2)[, list(myCount = .N), by = Date ]
# Date myCount
# 1: 2018-01-01 7
# 2: 2018-01-08 7
# 3: 2018-01-15 7
# 4: 2018-01-22 7
# 5: 2018-01-29 2
如果要计算每个日期的行数,可以尝试.N
foverlaps(df1, df2)[, .(mysum = .N), by = Date ]
Date mysum
1: 2018-01-01 7
2: 2018-01-08 7
3: 2018-01-15 7
4: 2018-01-22 7
5: 2018-01-29 2
如果您想要每个日期的唯一值计数,您可以尝试uniqueN()
foverlaps(df1, df2)[, .(mysum = uniqueN(value)), by = Date ]
Date mysum
1: 2018-01-01 7
2: 2018-01-08 7
3: 2018-01-15 7
4: 2018-01-22 7
5: 2018-01-29 2
.N
和uniqueN()
都来自{data.table}
。
而不是list(mySum = count(value))
尝试c(mySum = count(value))
。那么代码就是为我而运行的。
d2 <- foverlaps(df1, df2)[, c(mySum = count(value)), by = Date ]