两个具有相同列的数据框,如果R中列时间在24小时内,则子集一。

问题描述 投票:0回答:2

我有两个数据帧,变量标题相同,我想通过df2子集df1,如果一个人的名字有匹配,如果他们的订单时间小于24小时。

Name <- c("MCCARTNEY, PAUL", "STARR, RNGO", "HARRISON, GEORGE", "LENNON, JOHN")
Order_TM <-c("3/4/2020 15:16", "3/4/2020 15:16", "3/4/2020 15:16","3/4/2020 19:30")
df1 <-data.frame(Name, Order_TM)

在df2中,我有相同的名字,但不同的订单时间。

Name <- c("MCCARTNEY, PAUL", "STARR, RNGO", "HARRISON, GEORGE", "LENNON, JOHN")
Order_TM <-c("3/4/2020 18:16", "3/4/2020 20:16", "3/6/2020 15:16","3/5/2020 12:00")
df2 <-data.frame(Name, Order_TM)

我想把df1中的Order_TM与df2中的Order_TM相差不到24小时的情况下进行子集。 根据我的例子,结果将是MCCARTNEY, PAUL, STARR, RINGO和LENNON, JOHN。 但是我一直没有找到一种方法来做这件事。

r time datatable subset
2个回答
1
投票
# convert data frames to data.tables
library(data.table)
setDT(df1)
setDT(df2)

# convert Order_TM to datetime format
df1[, Order_TM := as.POSIXct(Order_TM, format = '%m/%d/%Y %R')]
df2[, Order_TM := as.POSIXct(Order_TM, format = '%m/%d/%Y %R')]

# join to find difference in hours between datetimes
df1[df2, on = .(Name), time_diff := abs(difftime(i.Order_TM, Order_TM, 'hours'))]

# subset based on time difference
df1[time_diff < 24]
#               Name            Order_TM  time_diff
# 1: MCCARTNEY, PAUL 2020-03-04 15:16:00  3.0 hours
# 2:     STARR, RNGO 2020-03-04 15:16:00  5.0 hours
# 3:    LENNON, JOHN 2020-03-04 19:30:00 16.5 hours

1
投票

如果你喜欢一个 dplyr 解决方案,你可以试试这个。

library(dplyr)
library(lubridate)
df1 %>% 
  left_join(df2,by = "Name") %>%
  mutate(Order_TM = Order_TM.x, TimeDiff = mdy_hm(Order_TM.x) - mdy_hm(Order_TM.y)) %>%
  filter(abs(TimeDiff) <= 24) %>%
  dplyr::select(-Order_TM.y,-Order_TM.x)
#             Name       Order_TM    TimeDiff
#1 MCCARTNEY, PAUL 3/4/2020 15:16  -3.0 hours
#2     STARR, RNGO 3/4/2020 15:16  -5.0 hours
#3    LENNON, JOHN 3/4/2020 19:30 -16.5 hours
最新问题
© www.soinside.com 2019 - 2024. All rights reserved.