我有两个数据帧,变量标题相同,我想通过df2子集df1,如果一个人的名字有匹配,如果他们的订单时间小于24小时。
Name <- c("MCCARTNEY, PAUL", "STARR, RNGO", "HARRISON, GEORGE", "LENNON, JOHN")
Order_TM <-c("3/4/2020 15:16", "3/4/2020 15:16", "3/4/2020 15:16","3/4/2020 19:30")
df1 <-data.frame(Name, Order_TM)
在df2中,我有相同的名字,但不同的订单时间。
Name <- c("MCCARTNEY, PAUL", "STARR, RNGO", "HARRISON, GEORGE", "LENNON, JOHN")
Order_TM <-c("3/4/2020 18:16", "3/4/2020 20:16", "3/6/2020 15:16","3/5/2020 12:00")
df2 <-data.frame(Name, Order_TM)
我想把df1中的Order_TM与df2中的Order_TM相差不到24小时的情况下进行子集。 根据我的例子,结果将是MCCARTNEY, PAUL, STARR, RINGO和LENNON, JOHN。 但是我一直没有找到一种方法来做这件事。
# convert data frames to data.tables
library(data.table)
setDT(df1)
setDT(df2)
# convert Order_TM to datetime format
df1[, Order_TM := as.POSIXct(Order_TM, format = '%m/%d/%Y %R')]
df2[, Order_TM := as.POSIXct(Order_TM, format = '%m/%d/%Y %R')]
# join to find difference in hours between datetimes
df1[df2, on = .(Name), time_diff := abs(difftime(i.Order_TM, Order_TM, 'hours'))]
# subset based on time difference
df1[time_diff < 24]
# Name Order_TM time_diff
# 1: MCCARTNEY, PAUL 2020-03-04 15:16:00 3.0 hours
# 2: STARR, RNGO 2020-03-04 15:16:00 5.0 hours
# 3: LENNON, JOHN 2020-03-04 19:30:00 16.5 hours
如果你喜欢一个 dplyr
解决方案,你可以试试这个。
library(dplyr)
library(lubridate)
df1 %>%
left_join(df2,by = "Name") %>%
mutate(Order_TM = Order_TM.x, TimeDiff = mdy_hm(Order_TM.x) - mdy_hm(Order_TM.y)) %>%
filter(abs(TimeDiff) <= 24) %>%
dplyr::select(-Order_TM.y,-Order_TM.x)
# Name Order_TM TimeDiff
#1 MCCARTNEY, PAUL 3/4/2020 15:16 -3.0 hours
#2 STARR, RNGO 3/4/2020 15:16 -5.0 hours
#3 LENNON, JOHN 3/4/2020 19:30 -16.5 hours