根据现有列的数量创建天数列

问题描述 投票:0回答:1

我有一个包含这样的产品和折扣的数据框

product disc1 disc2 dateini1     dateini2   total
p1      10    2     2024-04-05  2024-04-20 2
p2      5     4     2020-01-20  2024-04-15 2
p3      10    NA    2022-10-20  NA         1
p4      10    NA    2024-04-10  NA         1

我需要找到参考月份(在本例中为四月)中每个折扣可用的天数。但我有不同的情况,所以我需要我的表的结果。

product disc1 disc2 dateini1     dateini2    ndays1 ndays2
p1      10    2     2024-04-05  2024-04-20  15     11        
p2      5     4     2020-01-20  2024-04-15  15     15
p3      10    NA    2022-10-20  NA          30
p4      10    NA    2024-04-10  NA          21

对于只有一个折扣可用的情况,我已经通过以下代码实现了这一目标,但当我有多个折扣时,我需要对其进行概括,而在其他月份,我可以为特定产品提供 3 个或更多折扣。

per_actual <- as.Date("2024-04-01")

for (i in 1:nrow(df)) {
  if (df$total[i] == 1 & df$dateini1[i] <= per_actual) {
    df$ndays1[i] <- days_in_month(per_actual)
  } else {
    df$ndays1[i] <- as.Date(days_in_month(per_actual) + per_actual) - df$dateini1[i]
  }
}

我需要一些想法来自动化它,甚至更改我目前拥有的代码

r
1个回答
0
投票

这是一个

data.table
方法

样本数据

library(data.table)
library(lubridate)

# sample data
mydata <- fread("product disc1 disc2 dateini1     dateini2   total
p1      10    2     2024-04-05  2024-04-20 2
p2      5     4     2020-01-20  2024-04-15 2
p3      10    NA    2022-10-20  NA         1
p4      10    NA    2024-04-10  NA         1")
# or setDT(mydata)

per_actual <- as.IDate("2024-04-01")

代码

#melt to long format, remove incomplete rows
DT <- melt(mydata, id.vars = "product", 
     measure.vars = patterns(disc = "^disc", dateini = "^dateini"), 
     na.rm = TRUE)
# just to make sure, sort/key by product and dateini
setkey(DT, product, dateini)
# create end of discount period
DT[, end := shift(dateini, type = "lead") - 1, by = product]
# fill in NA
DT[is.na(end), end := ceiling_date(per_actual, unit = "months") - 1]
DT[, disc_interval := interval(start = dateini, end = end)]
# Key: <product, dateini>
#    product variable  disc    dateini        end                  disc_interval
#     <char>   <fctr> <int>     <IDat>     <IDat>                     <Interval>
# 1:      p1        1    10 2024-04-05 2024-04-20 2024-04-05 UTC--2024-04-20 UTC
# 2:      p1        2     2 2024-04-20 2024-04-30 2024-04-20 UTC--2024-04-30 UTC
# 3:      p2        1     5 2020-01-20 2024-04-15 2020-01-20 UTC--2024-04-15 UTC
# 4:      p2        2     4 2024-04-15 2024-04-30 2024-04-15 UTC--2024-04-30 UTC
# 5:      p3        1    10 2022-10-20 2024-04-30 2022-10-20 UTC--2024-04-30 UTC
# 6:      p4        1    10 2024-04-10 2024-04-30 2024-04-10 UTC--2024-04-30 UTC

# get the interval to look at (per_actula until end of month)
per_interval <- lubridate::interval(start = per_actual, 
                                    end = ceiling_date(per_actual, unit = "months") - 1)
# calculate the overlap in days (you can supress warnings if desired)
DT[, ndays := as.duration(lubridate::intersect(disc_interval, per_interval)) |> day()]
# cast to wide again
dcast(DT, product ~ variable, value.var = c("disc", "dateini", "ndays"))

最终输出

# Key: <product>
#    product disc_1 disc_2  dateini_1  dateini_2 ndays_1 ndays_2
#     <char>  <int>  <int>     <IDat>     <IDat>   <int>   <int>
# 1:      p1     10      2 2024-04-05 2024-04-20      15      11
# 2:      p2      5      4 2020-01-20 2024-04-15      14      16
# 3:      p3     10     NA 2022-10-20       <NA>      30      NA
# 4:      p4     10     NA 2024-04-10       <NA>      21      NA
© www.soinside.com 2019 - 2024. All rights reserved.