如何在时间范围内用相同的唯一标识符重复标记阈值之后的所有行?

问题描述 投票:0回答:1

我有一个包含数万行的数据框。

我想标记 Transaction 列达到或超过阈值(比如 100)的每一行,然后接下来的每一行都在 20 小时内发生并且与达到阈值的行具有相同的 UniqueID。 每次达到每个 UniqueID 的阈值时,它都需要能够执行此操作。 如果满足条件,则打上

FR
,否则打
NR
.

本质上,我有 3 个相关列,想添加第四个带有分类数据的列,名为 Flagged。

library(lubridate)
UniqueID <- c(214123, 214123, 214123, 214123, 987556, 987556, 987556, 987556, 987556)

datetime <- ymd_hms("2021-12-5 21:16:00", "2021-12-6 10:16:00", "2021-12-8 08:16:00", "2021-12-30 01:26:00", "2021-12-5 10:33:00", "2021-12-6 08:16:00", "2021-12-6 13:26:00", "2022-01-6 13:26:00", "2022-01-6 13:26:00")

Transactions <- c(100, 30, 20, 110, 30, 105, 50, 20, 140)

df <- data.frame(UniqueID, datetime, Transactions)

df

UniqueID
:每个用户唯一的标识符

datetime
:交易发生时

Transactions
:交易金额

在上面的示例中,第 1、2、4、6、7、9 行应标记为

FR
,而其他行为
NR
。最终,它应该看起来像:

UniqueID <- c(214123, 214123, 214123, 214123, 987556, 987556, 987556, 987556, 987556)

datetime <- ymd_hms("2021-12-5 21:16:00", "2021-12-6 10:16:00", "2021-12-8 08:16:00", "2021-12-30 01:26:00", "2021-12-5 10:33:00", "2021-12-6 08:16:00", "2021-12-6 13:26:00", "2022-01-6 13:26:00", "2022-01-6 13:26:00")

Transactions <- c(100, 30, 20, 110, 30, 105, 50, 20, 140)

Flagged <- c("FR", "FR", "NR", "FR", "NR", "FR", "FR", "NR", "FR")

df <- data.frame(UniqueID, datetime, Transactions, Flagged)

df
r dataframe
1个回答
0
投票

dplyr

library(dplyr)
library(tidyr) # fill
df %>%
  group_by(UniqueID) %>%
  mutate(last100 = if_else(Transactions >= 100, datetime, datetime[NA])) %>%
  fill(last100) %>%
  mutate(Flagged = coalesce(if_else(difftime(datetime, last100, units = "hours") <= 20, "FR", "NR"), "NA")) %>%
  ungroup() %>%
  select(-last100)
# # A tibble: 9 × 4
#   UniqueID datetime            Transactions Flagged
#      <dbl> <dttm>                     <dbl> <chr>  
# 1   214123 2021-12-05 21:16:00          100 FR     
# 2   214123 2021-12-06 10:16:00           30 FR     
# 3   214123 2021-12-08 08:16:00           20 NR     
# 4   214123 2021-12-30 01:26:00          110 FR     
# 5   987556 2021-12-05 10:33:00           30 NA     
# 6   987556 2021-12-06 08:16:00          105 FR     
# 7   987556 2021-12-06 13:26:00           50 FR     
# 8   987556 2022-01-06 13:26:00           20 NR     
# 9   987556 2022-01-06 13:26:00          140 FR     

数据表

library(data.table)
DT <- as.data.table(df)
DT[Transactions >= 100, last100 := datetime
  ][, last100 := nafill(last100, type = "locf"), by = .(UniqueID)
  ][, Flagged := fcoalesce(
     fifelse(difftime(datetime, last100, units = "hours") <= 20,
             "FR", "NR"),
     "NR")
  ][, last100 := NULL]
#    UniqueID            datetime Transactions Flagged
#       <num>              <POSc>        <num>  <char>
# 1:   214123 2021-12-05 21:16:00          100      FR
# 2:   214123 2021-12-06 10:16:00           30      FR
# 3:   214123 2021-12-08 08:16:00           20      NR
# 4:   214123 2021-12-30 01:26:00          110      FR
# 5:   987556 2021-12-05 10:33:00           30      NR
# 6:   987556 2021-12-06 08:16:00          105      FR
# 7:   987556 2021-12-06 13:26:00           50      FR
# 8:   987556 2022-01-06 13:26:00           20      NR
# 9:   987556 2022-01-06 13:26:00          140      FR
© www.soinside.com 2019 - 2024. All rights reserved.