我有一个DF是用户访问数据的时间序列。
UserID Access Date
a 10/01/2019
b 10/01/2019
c 10/01/2019
a 10/02/2019
b 10/02/2019
d 10/02/2019
e 10/03/2019
f 10/03/2019
a 10/03/2019
b 10/03/2019
a 10/04/2019
b 10/04/2019
c 10/05/2019
我有另一个列出日期的df,我想汇总过去3天滚动的UserIDs的唯一出现次数。预期的输出会像下面这样。
Date Past_3_days_unique_count
10/01/2019 NaN
10/02/2019 NaN
10/03/2019 6
10/04/2019 5
10/04/2019 5
我如何才能实现这个目标?
这很直接--让我通过下面的片段及其注释来引导你完成它。
import pandas as pd
import numpy as np
# Generate some dates
dates = pd.date_range("01-01-2016", "01-10-2016", freq="6H")
# Generate some user ids
ids = np.random.randint(1, 5, len(dates))
df = pd.DataFrame({"id": ids, "date": dates})
# Collect unique IDs for each day
q = df.groupby(df["date"].dt.to_period("D"))["id"].nunique()
# Grab the rolling sum over 3 previous days which is what we wanted
q.rolling(3).sum()
使用pandas groupby的文档非常好。