我需要找到一个客户已经买了P1,多少天后客户会买P2。我无法找到同一客户从订购P1到下一次订购P2之间的天数。
我的数据如下图所示,我需要找到一个客户已经购买了P1,并且在多少天后客户会购买P2。
Customer ID Order_Date Product
C-87 11/20/2018 P2
C-87 7/25/2018 P1
C-87 7/19/2019 P1
C-87 8/2/2018 P2
C-87 12/9/2019 P1
... ... ... ... ...
C-22 9/22/2018 P2
C-22 9/4/2018 P2
C-22 1/15/2018 P1
C-22 9/5/2019 P2
C-22 3/20/2018 P1 12
试试这个。
df['Order_Date'] = pd.to_datetime(df['Order_Date'])
df.sort_values(['ID', 'Order_Date', 'Product'], inplace = True)
df['next_order_date'] = df['Order_Date'].shift(-1)
df['next_product'] = df['Product'].shift(-1)
df['days'] = df.apply(lambda row: row['next_order_date'] - row['Order_Date'] if ((row['next_product'] == 'P2') & (row['Product'] == 'P1')) else np.nan, axis = 1)
结果在
ID Order_Date Product next_oder_date days
7 C-22 2018-01-15 P1 2018-09-04 232 days
6 C-22 2018-09-04 P2 2018-09-22 NaT
5 C-22 2018-09-22 P2 2019-09-05 NaT
8 C-22 2019-09-05 P2 2018-07-25 NaT
1 C-87 2018-07-25 P1 2018-08-02 8 days
3 C-87 2018-08-02 P2 2018-11-20 NaT
0 C-87 2018-11-20 P2 2019-07-19 NaT
2 C-87 2019-07-19 P1 2019-12-09 143 days
4 C-87 2019-12-09 P1 NaT NaT