我已经得到了我想要匹配的,我只有开始日期的事件日期。作为简化reprex,说我想弄清楚谁是在某些事件的总统,但我只有就职日期。
pres <- data.frame(pres = c("Ronald Reagan", "George H. W. Bush",
"Bill Clinton", "George W. Bush", "Barack
Obama", "Donald Trump"),
inaugdate = structure(c(4037, 6959, 8420, 11342, 14264,
17186), class = "Date"))
events <- data.frame(event = c("Challenger explosion", "Chernobyl
explosion", "Hurricane Katrina", "9-11"),
date = structure(c(5871, 5959, 13024, 11576), class = "Date"))
很显然,一个简单的left_join不会起作用,因为该事件未对就职日发生。
events %>%
left_join(pres, by = c("date" = "inaugdate"))
在Excel中,VLOOKUP用来给你的真正的一个选项(最接近的匹配以前)或假(匹配精确)。有没有在tidyverse类似的东西?
下面是达到预期的效果的一种方式,但它很可能被美化了一番位。您可以创建的间隔,这是由lubridate
提供与特定的开始和结束时间指定时间跨度的类。这自带%within%
运营商,看是否有日期是在该时间间隔。因此,我们可以先创建这个区间,使pres
列字符类型,因此我们可以适当的索引它。然后,我们遍历事件与map_chr
日期,使用的是说,一个功能“检查,如果这个日期是在每个时间间隔,获得一个,它实际上是在(与which
)的索引,并返回相应的总统” 。显然,这要求每个日期仅在一个区间中,否则这将失败。
library(tidyverse)
library(lubridate)
pres <- data.frame(pres = c("Ronald Reagan", "George H. W. Bush",
"Bill Clinton", "George W. Bush",
"Barack Obama", "Donald Trump"),
inaugdate = structure(c(4037, 6959, 8420, 11342, 14264,
17186), class = "Date"))
events <- data.frame(event = c("Challenger explosion", "Chernobyl explosion",
"Hurricane Katrina", "9-11"),
date = structure(c(5871, 5959, 13024, 11576), class = "Date"))
pres2 <- pres %>%
mutate(
presidency = interval(inaugdate, lead(inaugdate, default = today())),
pres = as.character(pres)
)
events %>%
mutate(pres = map_chr(date, ~ pres2$pres[which(. %within% pres2$presidency)]))
#> event date pres
#> 1 Challenger explosion 1986-01-28 Ronald Reagan
#> 2 Chernobyl explosion 1986-04-26 Ronald Reagan
#> 3 Hurricane Katrina 2005-08-29 George W. Bush
#> 4 9-11 2001-09-11 George W. Bush
由reprex package创建于2019年2月4日(v0.2.1)
也许不是最有效的,但我们可以用一个不等式联同sqldf
:
library(sqldf)
sqldf('select a.event, a.date, b.pres
from events a
left join pres b
on a.date >= b.inaugdate
group by a.event
having min(a.date - b.inaugdate)
order by date, event')
输出:
event date pres
1 Challenger explosion 1986-01-28 Ronald Reagan
2 Chernobyl explosion 1986-04-26 Ronald Reagan
3 9-11 2001-09-11 George W. Bush
4 Hurricane Katrina 2005-08-29 George W. Bush
也许效率不高(取决于行数和列数),但另一种方式来解决这个问题。
library(dplyr)
pres <- data.frame(pres = c("Ronald Reagan", "George H. W. Bush",
"Bill Clinton", "George W. Bush", "Barack Obama", "Donald Trump"),
inaugdate = structure(c(4037, 6959, 8420, 11342, 14264,
17186), class = "Date")) %>%
#lead date to get interval
mutate(enddt = lead(inaugdate, default = Sys.Date())-1)
events <- data.frame(event = c("Challenger explosion", "Chernobyl explosion", "Hurricane Katrina", "9-11"),
date = structure(c(5871, 5959, 13024, 11576), class = "Date"))
#get every combination of rows
newdf <- merge(pres,events,all = TRUE) %>%
filter(date >= inaugdate, date < enddt)