tidyverse:符合特定日期时间事件

问题描述 投票:4回答:3

我已经得到了我想要匹配的,我只有开始日期的事件日期。作为简化reprex,说我想弄清楚谁是在某些事件的总统,但我只有就职日期。

pres <- data.frame(pres = c("Ronald Reagan", "George H. W. Bush", 
                            "Bill Clinton", "George W. Bush", "Barack 
                             Obama", "Donald Trump"), 
                     inaugdate = structure(c(4037, 6959, 8420, 11342, 14264, 
                                             17186), class = "Date"))

events <- data.frame(event = c("Challenger explosion", "Chernobyl 
                                explosion", "Hurricane Katrina", "9-11"), 
                      date = structure(c(5871, 5959, 13024, 11576), class = "Date"))

很显然,一个简单的left_join不会起作用,因为该事件未对就职日发生。

events %>%
      left_join(pres, by = c("date" = "inaugdate"))

在Excel中,VLOOKUP用来给你的真正的一个选项(最接近的匹配以前)或假(匹配精确)。有没有在tidyverse类似的东西?

r vlookup tidyverse
3个回答
4
投票

下面是达到预期的效果的一种方式,但它很可能被美化了一番位。您可以创建的间隔,这是由lubridate提供与特定的开始和结束时间指定时间跨度的类。这自带%within%运营商,看是否有日期是在该时间间隔。因此,我们可以先创建这个区间,使pres列字符类型,因此我们可以适当的索引它。然后,我们遍历事件与map_chr日期,使用的是说,一个功能“检查,如果这个日期是在每个时间间隔,获得一个,它实际上是在(与which)的索引,并返回相应的总统” 。显然,这要求每个日期仅在一个区间中,否则这将失败。

library(tidyverse)
library(lubridate)

pres <- data.frame(pres = c("Ronald Reagan", "George H. W. Bush", 
                            "Bill Clinton", "George W. Bush",
                            "Barack Obama", "Donald Trump"), 
                   inaugdate = structure(c(4037, 6959, 8420, 11342, 14264, 
                                           17186), class = "Date"))

events <- data.frame(event = c("Challenger explosion", "Chernobyl explosion",
                               "Hurricane Katrina", "9-11"), 
                     date = structure(c(5871, 5959, 13024, 11576), class = "Date"))

pres2 <- pres %>%
  mutate(
    presidency = interval(inaugdate, lead(inaugdate, default = today())),
    pres = as.character(pres)
  )
events %>%
  mutate(pres = map_chr(date, ~ pres2$pres[which(. %within% pres2$presidency)]))
#>                  event       date           pres
#> 1 Challenger explosion 1986-01-28  Ronald Reagan
#> 2  Chernobyl explosion 1986-04-26  Ronald Reagan
#> 3    Hurricane Katrina 2005-08-29 George W. Bush
#> 4                 9-11 2001-09-11 George W. Bush

reprex package创建于2019年2月4日(v0.2.1)


1
投票

也许不是最有效的,但我们可以用一个不等式联同sqldf

library(sqldf)

sqldf('select a.event, a.date, b.pres
      from events a 
      left join pres b
      on a.date >= b.inaugdate
      group by a.event 
      having min(a.date - b.inaugdate)
      order by date, event')

输出:

                 event       date           pres
1 Challenger explosion 1986-01-28  Ronald Reagan
2  Chernobyl explosion 1986-04-26  Ronald Reagan
3                 9-11 2001-09-11 George W. Bush
4    Hurricane Katrina 2005-08-29 George W. Bush

0
投票

也许效率不高(取决于行数和列数),但另一种方式来解决这个问题。

library(dplyr) 

pres <- data.frame(pres = c("Ronald Reagan", "George H. W. Bush", 
                            "Bill Clinton", "George W. Bush", "Barack Obama", "Donald Trump"), 
                   inaugdate = structure(c(4037, 6959, 8420, 11342, 14264, 
                                           17186), class = "Date")) %>% 
                  #lead date to get interval
                  mutate(enddt = lead(inaugdate, default = Sys.Date())-1)

events <- data.frame(event = c("Challenger explosion", "Chernobyl explosion", "Hurricane Katrina", "9-11"), 
                     date = structure(c(5871, 5959, 13024, 11576), class = "Date"))          
#get every combination of rows
newdf <- merge(pres,events,all = TRUE) %>% 
  filter(date >= inaugdate, date < enddt)
© www.soinside.com 2019 - 2024. All rights reserved.