我想创建一个数据框,在第一列中,我将具有某个时间段的所有日期,在第二列中,将包含每个日期发生的事件数,包括没有发生任何事件的日期。我还想计算已分配了特定因素的事件
我拥有事件的第一个数据框,其中的事件具有给定日期的日期:
Row Sex Age Date
1 2 36 2004-01-05
2 1 47 2004-01-06
3 1 26 2004-01-10
4 2 23 2004-01-20
5 1 50 2004-01-27
6 2 35 2004-01-28
7 1 35 2004-01-30
8 1 38 2004-02-06
9 2 29 2004-02-11
在“性别”列中,1表示女性,2表示男性。
第二个数据框,其中有我所检查的日期的日期:
排期
1 2004-01-05
2 2004-01-06
3 2004-01-07
4 2004-01-08
5 2004-01-09
6 2004-01-10
7 2004-01-11
8 2004-01-12
9 2004-01-13
10 2004-01-14
我想要一个看起来像这样的数据框:
Row Date Events (All) Events (Female) Events (Male)
1 2004-01-05 1 0 1
2 2004-01-06 1 1 0
3 2004-01-07 0 0 0
4 2004-01-08 0 0 0
5 2004-01-09 0 0 0
6 2004-01-10 0 1 0
7 2004-01-11 0 0 0
8 2004-01-12 0 0 0
9 2004-01-13 0 0 0
10 2004-01-14 0 0 0
任何人都可以帮忙吗?
library(data.table)
library(magrittr) # just for %>%
out <- dat1 %>%
dcast(Date ~ Sex, data = ., fun.aggregate = length) %>%
setnames(., c("1", "2"), c("Female", "Male")) %>%
.[ dat2[ , .(Date)], on = "Date" ] %>%
.[, lapply(.SD, function(a) replace(a, is.na(a), 0)), ] %>%
.[, All := Female + Male ]
out
# Date Female Male All
# 1: 2004-01-05 0 1 1
# 2: 2004-01-06 1 0 1
# 3: 2004-01-07 0 0 0
# 4: 2004-01-08 0 0 0
# 5: 2004-01-09 0 0 0
# 6: 2004-01-10 1 0 1
# 7: 2004-01-11 0 0 0
# 8: 2004-01-12 0 0 0
# 9: 2004-01-13 0 0 0
# 10: 2004-01-14 0 0 0
请注意,使用lapply
可能不是将NA
替换为0的整体最快方法,但可以理解这一点。另外,我仅使用magrittr::%>%
来划分步骤,而无需%>%
即可轻松完成。数据:
dat1 <- fread(text = " Row Sex Age Date 1 2 36 2004-01-05 2 1 47 2004-01-06 3 1 26 2004-01-10 4 2 23 2004-01-20 5 1 50 2004-01-27 6 2 35 2004-01-28 7 1 35 2004-01-30 8 1 38 2004-02-06 9 2 29 2004-02-11") dat2 <- fread(text = " Row Date 1 2004-01-05 2 2004-01-06 3 2004-01-07 4 2004-01-08 5 2004-01-09 6 2004-01-10 7 2004-01-11 8 2004-01-12 9 2004-01-13 10 2004-01-14")