我有一个表:customer_ids,order_ids,product_id和order_dates,我想在我的表中添加一列,其中包含购买此产品的每个客户的最后订单日期(在python中)。
customerid orderid productid orderdate
-----------------------------------------------------
1 1 1 2018/01/01
1 1 2 2018/01/01
1 2 3 2018/01/04
1 3 1 2018/01/10
2 5 1 2018/01/14
1 7 3 2018/01/17
2 12 2 2018/01/12
1 20 1 2018/01/23
我想要一个这样的表:
customerid orderid productid orderdate lastorderdate
----------------------------------------------------------------------
1 1 1 2018/01/01 NA
1 1 2 2018/01/01 NA
1 2 3 2018/01/04 NA
1 3 1 2018/01/10 2018/01/01
2 5 1 2018/01/14 NA
1 7 3 2018/01/17 2018/01/04
2 12 2 2018/01/12 NA
2 20 1 2018/01/23 2018/01/14
我该怎么办?
IIUC,您可以使用:
df=df.sort_values(['customerid','productid'])
df['last_order']=df.groupby(['productid','customerid'])['orderdate'].\
apply(lambda x: x.shift())
print(df)
输出是:
customerid orderid productid orderdate last_order
0 1 1 1 2018-01-01 NaT
3 1 3 1 2018-01-10 2018-01-01
7 1 20 1 2018-01-23 2018-01-10
1 1 1 2 2018-01-01 NaT
2 1 2 3 2018-01-04 NaT
5 1 7 3 2018-01-17 2018-01-04
4 2 5 1 2018-01-14 NaT
6 2 12 2 2018-01-12 NaT
您也可以使用df = df.sort_index()
将索引与原始对齐。
根据您的数据输出:
df=df.sort_values(['customer_id','product_id'])
df['last_order']=df.groupby(['product_id','customer_id'])['date'].\
apply(lambda x: x.shift())
print(df.sort_index().head(20))
row_id date customer_id product_id last_order
0 1 2018-04-07 4 1 NaT
1 2 2018-04-07 4 1 2018-04-07
2 3 2018-04-07 4 1 2018-04-07
3 4 2018-04-07 4 1 2018-04-07
4 5 2018-04-07 4 1 2018-04-07
5 6 2018-04-07 4 1 2018-04-07
6 7 2018-04-07 4 1 2018-04-07
7 8 2018-04-07 4 1 2018-04-07
8 13 2018-04-09 4 1 2018-04-07
9 49 2018-04-13 4 1 2018-04-09
10 106 2018-04-20 4 1 2018-04-13
11 115 2018-04-20 4 1 2018-04-20
12 142 2018-04-27 4 2 NaT
13 143 2018-04-27 4 2 2018-04-27
14 149 2018-04-29 4 2 2018-04-27
15 168 2018-05-02 4 1 2018-04-20
16 169 2018-05-02 4 1 2018-05-02
17 229 2018-05-08 4 5 NaT
18 230 2018-05-08 4 5 2018-05-08
19 231 2018-05-08 4 5 2018-05-08
row_id date customer_id product_id
1 4/7/2018 4 1
2 4/7/2018 4 1
3 4/7/2018 4 1
4 4/7/2018 4 1
5 4/7/2018 4 1
6 4/7/2018 4 1
7 4/7/2018 4 1
8 4/7/2018 4 1
13 4/9/2018 4 1
49 4/13/2018 4 1
106 4/20/2018 4 1
115 4/20/2018 4 1
142 4/27/2018 4 2
143 4/27/2018 4 2
149 4/29/2018 4 2
168 5/2/2018 4 1
169 5/2/2018 4 1
229 5/8/2018 4 5
230 5/8/2018 4 5
231 5/8/2018 4 5
233 5/9/2018 4 1
237 5/9/2018 4 5
238 5/9/2018 4 5
239 5/9/2018 4 5
240 5/9/2018 4 5
241 5/9/2018 4 5
255 5/14/2018 4 5
256 5/14/2018 4 5
257 5/14/2018 4 5
258 5/14/2018 4 5
259 5/14/2018 4 5
268 5/15/2018 4 5
278 5/17/2018 4 3
293 5/19/2018 4 5
294 5/19/2018 4 5
295 5/19/2018 4 5
296 5/19/2018 4 5
298 5/20/2018 4 5
370 5/21/2018 4 5
371 5/21/2018 4 5
401 5/26/2018 4 2
416 5/30/2018 4 5
417 5/30/2018 4 5
418 5/30/2018 4 5
445 5/31/2018 4 1
446 5/31/2018 4 1
447 5/31/2018 4 1
448 5/31/2018 4 1
449 5/31/2018 4 1
51767 6/13/2018 4 2
51768 6/13/2018 4 2
51769 6/13/2018 4 2
51770 6/13/2018 4 2
51771 6/13/2018 4 2
51772 6/13/2018 4 2
53245 6/19/2018 4 1
53247 6/19/2018 4 1
54773 7/25/2018 4 1
54837 7/26/2018 4 5
54838 7/26/2018 4 5
54891 7/27/2018 4 1
54920 7/28/2018 4 5
54922 7/28/2018 4 5
54979 7/29/2018 4 5
54980 7/29/2018 4 5
54981 7/29/2018 4 5
54982 7/29/2018 4 5
54983 7/29/2018 4 5
54984 7/29/2018 4 5
54985 7/29/2018 4 5
55039 7/30/2018 4 5
55040 7/30/2018 4 5
55041 7/30/2018 4 5
55042 7/30/2018 4 5
55043 7/30/2018 4 5
55044 7/30/2018 4 5
55045 7/30/2018 4 5
55046 7/30/2018 4 5
55537 8/5/2018 4 5
55640 8/6/2018 4 5
55653 8/6/2018 4 5
55654 8/6/2018 4 5
55655 8/6/2018 4 5
55656 8/6/2018 4 5
55658 8/6/2018 4 5
55853 8/8/2018 4 5
55854 8/8/2018 4 5
55855 8/8/2018 4 5
55856 8/8/2018 4 5
55857 8/8/2018 4 5
55858 8/8/2018 4 5
55859 8/8/2018 4 5
55860 8/8/2018 4 5
56011 8/11/2018 4 5