在时间范围内获取计数

问题描述 投票:0回答:1

我的数据基本上是一个包含产品,日期和客户ID的购买清单。可以创建示例数据,如下所示 -

custId=c('A','A','B','C','A','D','E','F','B','C','F')
ProductPurchase=c('Milk','Tea','Milk','Eggs','Coffee','sugar','Chicken','milk','Apple','sugar','eggs')
BuyDate=c('1-03-2014','4-05-2017','15-02-2015','23-04-2014','12-04-2017','23-5-2016','13-5-2012','5-05-2014','2-03-2017','03-03-2017','21-06-2017')
ExpiryDate=c('1-03-2017','4-05-2022','15-02-2017','12-05-2015','12-04-2022','12-7-2018','23-06-2015','15-06-2017','3-03-2020','2-05-2019','21-07-2019')
DummyD=data.frame(custId,ProductPurchase,BuyDate,ExpiryDate)

data output

  custId ProductPurchase    BuyDate ExpiryDate
1      A            Milk  1-03-2014  1-03-2017
2      A             Tea  4-05-2017  4-05-2022
3      B            Milk 15-02-2015 15-02-2017
4      C            Eggs 23-04-2014 12-05-2015
5      A          Coffee 12-04-2017 12-04-2022
6      D           sugar  23-5-2016  12-7-2018

我希望检索购买牛奶的客户,并再次购买(任何产品)+ - 60天到期(可能在到期前60天或之后)

例如,对于下面的数据,输出应该是这样的

CustID   BoughtWithin60Days   ProductExpiry  ProductBought  Expiry Date     BuyD
A           yes                     Milk        Coffee      1-03-2017      12-04-2017
B           yes                     Milk         Apple      15-02-2017     2-03-2017
F           yes                     Milk        Eggs        15-06-2017     21-06-2017
sql r plyr reshape
1个回答
0
投票

这更像是merge问题,而不是reshape问题。

这是使用“data.table”的可能解决方案。

从清理开始。您需要适当的日期,并且需要确保您的“产品购买”列可用于合并。

library(data.table)
setDT(DummyD)
DummyD[, c("ProductPurchase", "BuyDate", "ExpiryDate") := 
         list(tolower(ProductPurchase),
              as.Date(BuyDate, format = "%d-%m-%Y"),
              as.Date(ExpiryDate, format = "%d-%m-%Y"))][]

创建购买产品为“牛奶”的那些行的子集。在到期后的+/- 60天内添加两列。

milk <- DummyD[ProductPurchase == "milk"][
  , c("Min", "Max") := list(ExpiryDate - 60, ExpiryDate + 60)]

创建购买的所有其他产品的子集。

others <- DummyD[ProductPurchase != "milk"]

合并“custId”列上的两个子集。然后,通过使用先前计算的“最小”和“最大”值检查第二个产品(BuyDate.y)的购买日期,添加一个指示栏以说明它是否在60天内购买。

out <- merge(milk, others, "custId")[, within60 := BuyDate.y - 60 > Min & BuyDate.y < Max][]
out
#    custId ProductPurchase.x  BuyDate.x ExpiryDate.x        Min        Max
# 1:      A              milk 2014-03-01   2017-03-01 2016-12-31 2017-04-30
# 2:      A              milk 2014-03-01   2017-03-01 2016-12-31 2017-04-30
# 3:      B              milk 2015-02-15   2017-02-15 2016-12-17 2017-04-16
# 4:      F              milk 2014-05-05   2017-06-15 2017-04-16 2017-08-14
#    ProductPurchase.y  BuyDate.y ExpiryDate.y within60
# 1:               tea 2017-05-04   2022-05-04    FALSE
# 2:            coffee 2017-04-12   2022-04-12     TRUE
# 3:             apple 2017-03-02   2020-03-03     TRUE
# 4:              eggs 2017-06-21   2019-07-21     TRUE

如果您只想返回“TRUE”值,那么您可以使用:

out[(within60)]
© www.soinside.com 2019 - 2024. All rights reserved.