我有一个包含这样的产品和折扣的数据框
product disc1 disc2 dateini1 dateini2 total
p1 10 2 2024-04-05 2024-04-20 2
p2 5 4 2020-01-20 2024-04-15 2
p3 10 NA 2022-10-20 NA 1
p4 10 NA 2024-04-10 NA 1
我需要找到参考月份(在本例中为四月)中每个折扣可用的天数。但我有不同的情况,所以我需要我的表的结果。
product disc1 disc2 dateini1 dateini2 ndays1 ndays2
p1 10 2 2024-04-05 2024-04-20 15 11
p2 5 4 2020-01-20 2024-04-15 15 15
p3 10 NA 2022-10-20 NA 30
p4 10 NA 2024-04-10 NA 21
对于只有一个折扣可用的情况,我已经通过以下代码实现了这一目标,但当我有多个折扣时,我需要对其进行概括,而在其他月份,我可以为特定产品提供 3 个或更多折扣。
per_actual <- as.Date("2024-04-01")
for (i in 1:nrow(df)) {
if (df$total[i] == 1 & df$dateini1[i] <= per_actual) {
df$ndays1[i] <- days_in_month(per_actual)
} else {
df$ndays1[i] <- as.Date(days_in_month(per_actual) + per_actual) - df$dateini1[i]
}
}
我需要一些想法来自动化它,甚至更改我目前拥有的代码
这是一个
data.table
方法
样本数据
library(data.table)
library(lubridate)
# sample data
mydata <- fread("product disc1 disc2 dateini1 dateini2 total
p1 10 2 2024-04-05 2024-04-20 2
p2 5 4 2020-01-20 2024-04-15 2
p3 10 NA 2022-10-20 NA 1
p4 10 NA 2024-04-10 NA 1")
# or setDT(mydata)
per_actual <- as.IDate("2024-04-01")
代码
#melt to long format, remove incomplete rows
DT <- melt(mydata, id.vars = "product",
measure.vars = patterns(disc = "^disc", dateini = "^dateini"),
na.rm = TRUE)
# just to make sure, sort/key by product and dateini
setkey(DT, product, dateini)
# create end of discount period
DT[, end := shift(dateini, type = "lead") - 1, by = product]
# fill in NA
DT[is.na(end), end := ceiling_date(per_actual, unit = "months") - 1]
DT[, disc_interval := interval(start = dateini, end = end)]
# Key: <product, dateini>
# product variable disc dateini end disc_interval
# <char> <fctr> <int> <IDat> <IDat> <Interval>
# 1: p1 1 10 2024-04-05 2024-04-20 2024-04-05 UTC--2024-04-20 UTC
# 2: p1 2 2 2024-04-20 2024-04-30 2024-04-20 UTC--2024-04-30 UTC
# 3: p2 1 5 2020-01-20 2024-04-15 2020-01-20 UTC--2024-04-15 UTC
# 4: p2 2 4 2024-04-15 2024-04-30 2024-04-15 UTC--2024-04-30 UTC
# 5: p3 1 10 2022-10-20 2024-04-30 2022-10-20 UTC--2024-04-30 UTC
# 6: p4 1 10 2024-04-10 2024-04-30 2024-04-10 UTC--2024-04-30 UTC
# get the interval to look at (per_actula until end of month)
per_interval <- lubridate::interval(start = per_actual,
end = ceiling_date(per_actual, unit = "months") - 1)
# calculate the overlap in days (you can supress warnings if desired)
DT[, ndays := as.duration(lubridate::intersect(disc_interval, per_interval)) |> day()]
# cast to wide again
dcast(DT, product ~ variable, value.var = c("disc", "dateini", "ndays"))
最终输出
# Key: <product>
# product disc_1 disc_2 dateini_1 dateini_2 ndays_1 ndays_2
# <char> <int> <int> <IDat> <IDat> <int> <int>
# 1: p1 10 2 2024-04-05 2024-04-20 15 11
# 2: p2 5 4 2020-01-20 2024-04-15 14 16
# 3: p3 10 NA 2022-10-20 <NA> 30 NA
# 4: p4 10 NA 2024-04-10 <NA> 21 NA