我有一个包含多行物种的数据框(df1),以及该物种发生的日期时间事件,如下所示:
df1 <- as.data.frame(sample(seq(from=as.POSIXct("2023-07-01 00:00"),
to=as.POSIXct("2023-07-01 00:20"), by="sec"), 21))
df1
colnames(df1) <- c('day.hour.df1') #rename coloumn of df2
df1$Species <- c("a", "b", "b", "a", "c", "NA", "a", "a", "c", "b", "b",
"c", "c", "NA", "a", "a", "b", "b", "a", "NA", "b")
# add species column
names(df1)
我有第二个数据框(df2),作为列,包含开始(作为日期和时间)(START.df2)、结束(END.df2)和物种(此处为“a”)。第二个数据框如下所示:
df2 <- as.data.frame(seq(from=as.POSIXct("2023-07-01 00:00:00"),
to=as.POSIXct("2023-07-01 00:20:00"), by="min"))
df2
df2$time2 <- (seq(from=as.POSIXct("2023-07-01 00:00:59"),
to=as.POSIXct("2023-07-01 00:20:59"), by="min"))
names(df1)
df2$species <- (c('a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a',
'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a'))
df2
colnames(df1) <- c('START.df2', 'END.df2', 'Species')
我想向第二个数据框(df2)添加一个名为“occurrence_a”的新列,其中,如果df1中包含的物种“a”出现在START(START.df2)之间的day_time(day.hour.df1)和数据帧 2 (df2) 的 END (END.df2),它给出 1,否则给出 0。
我尝试过但没有成功:
library(dplyr)
df2 %>% left_join(df1, by = "Species") %>%
mutate( = between(day.hour.df2, START.df1, END.df1)) %>%
group_by(species, day.hour.df1) %>%
summarise(OCCURRENCEa = any(OCCURRENCEa))
df2
我不完全确定你想要什么输出,但我认为这至少应该接近:
left_join(df2, df1, join_by(Species, between(y$day.hour.df1, x$START.df2, x$END.df2))) %>%
group_by(Species, START.df2, END.df2) %>%
summarise(
OCCURRENCE = any(!is.na(day.hour.df1)) %>% as.numeric(),
.groups = 'drop'
) %>%
pivot_wider(names_from = Species, values_from = OCCURRENCE, names_prefix = 'OCCURRENCE')
对于这个示例,您实际上并不需要旋转等,但我假设您的真实数据中有多个物种,并且需要不同的列。