我的数据基本上是一个包含产品,日期和客户ID的购买清单。可以创建示例数据,如下所示 -
custId=c('A','A','B','C','A','D','E','F','B','C','F')
ProductPurchase=c('Milk','Tea','Milk','Eggs','Coffee','sugar','Chicken','milk','Apple','sugar','eggs')
BuyDate=c('1-03-2014','4-05-2017','15-02-2015','23-04-2014','12-04-2017','23-5-2016','13-5-2012','5-05-2014','2-03-2017','03-03-2017','21-06-2017')
ExpiryDate=c('1-03-2017','4-05-2022','15-02-2017','12-05-2015','12-04-2022','12-7-2018','23-06-2015','15-06-2017','3-03-2020','2-05-2019','21-07-2019')
DummyD=data.frame(custId,ProductPurchase,BuyDate,ExpiryDate)
custId ProductPurchase BuyDate ExpiryDate
1 A Milk 1-03-2014 1-03-2017
2 A Tea 4-05-2017 4-05-2022
3 B Milk 15-02-2015 15-02-2017
4 C Eggs 23-04-2014 12-05-2015
5 A Coffee 12-04-2017 12-04-2022
6 D sugar 23-5-2016 12-7-2018
我希望检索购买牛奶的客户,并再次购买(任何产品)+ - 60天到期(可能在到期前60天或之后)
例如,对于下面的数据,输出应该是这样的
CustID BoughtWithin60Days ProductExpiry ProductBought Expiry Date BuyD
A yes Milk Coffee 1-03-2017 12-04-2017
B yes Milk Apple 15-02-2017 2-03-2017
F yes Milk Eggs 15-06-2017 21-06-2017
这更像是merge
问题,而不是reshape
问题。
这是使用“data.table”的可能解决方案。
从清理开始。您需要适当的日期,并且需要确保您的“产品购买”列可用于合并。
library(data.table)
setDT(DummyD)
DummyD[, c("ProductPurchase", "BuyDate", "ExpiryDate") :=
list(tolower(ProductPurchase),
as.Date(BuyDate, format = "%d-%m-%Y"),
as.Date(ExpiryDate, format = "%d-%m-%Y"))][]
创建购买产品为“牛奶”的那些行的子集。在到期后的+/- 60天内添加两列。
milk <- DummyD[ProductPurchase == "milk"][
, c("Min", "Max") := list(ExpiryDate - 60, ExpiryDate + 60)]
创建购买的所有其他产品的子集。
others <- DummyD[ProductPurchase != "milk"]
合并“custId”列上的两个子集。然后,通过使用先前计算的“最小”和“最大”值检查第二个产品(BuyDate.y)的购买日期,添加一个指示栏以说明它是否在60天内购买。
out <- merge(milk, others, "custId")[, within60 := BuyDate.y - 60 > Min & BuyDate.y < Max][]
out
# custId ProductPurchase.x BuyDate.x ExpiryDate.x Min Max
# 1: A milk 2014-03-01 2017-03-01 2016-12-31 2017-04-30
# 2: A milk 2014-03-01 2017-03-01 2016-12-31 2017-04-30
# 3: B milk 2015-02-15 2017-02-15 2016-12-17 2017-04-16
# 4: F milk 2014-05-05 2017-06-15 2017-04-16 2017-08-14
# ProductPurchase.y BuyDate.y ExpiryDate.y within60
# 1: tea 2017-05-04 2022-05-04 FALSE
# 2: coffee 2017-04-12 2022-04-12 TRUE
# 3: apple 2017-03-02 2020-03-03 TRUE
# 4: eggs 2017-06-21 2019-07-21 TRUE
如果您只想返回“TRUE”值,那么您可以使用:
out[(within60)]