大熊猫每组唯一值的一年滚动计数

问题描述 投票:0回答:1

所以我有以下数据框:

Period      group     ID    
20130101     A        10
20130101     A        20
20130301     A        20
20140101     A        20
20140301     A        30
20140401     A        40

20130101     B        11
20130201     B        21
20130401     B        31
20140401     B        41
20140501     B        51

我需要用ID来计算去年有多少个group。所以我想要的输出看起来像这样:

Period      group     num_ids_last_year
20130101     A            2 # ID 10 and 20 in the last year
20130301     A            2 
20140101     A            2 
20140301     A            2 # ID 30 enters, ID 10 leaves
20140401     A            3 # ID 40 enters

20130101     B            1
20130201     B            2
20130401     B            3
20140401     B            2 # ID 11 and 21 leave 
20140501     B            2 # ID 31 leaves, ID 51 enters

期间为日期时间格式。我尝试了很多方法:

df.groupby(['group','Period'])['ID'].nunique() # Get number of IDs by group in a given period.
df.groupby(['group'])['ID'].nunique() # Get total number of IDs by group.

df.set_index('Period').groupby('group')['ID'].rolling(window=1, freq='Y').nunique()

但是最后一个甚至不可能。有没有简单的方法可以做到这一点?我在想cumcount()pd.DateOffsetge(df.Period - dt.timedelta(365)的某种组合,但我找不到答案。

谢谢。

编辑:添加了一个事实,即我可以在给定的ID中找到多个Period

python pandas group-by
1个回答
0
投票
from dateutil.relativedelta import relativedelta
df.sort_values(by=['Period'], inplace=True) # if not already sorted

# create new output df
df1 = (df.groupby(['Period','group'])['ID']
       .apply(lambda x: list(x))
       .reset_index())
df1['num_ids_last_year'] = df1.apply(lambda x: len(set(df1.loc[(df1['Period'] >= x['Period']-relativedelta(years=1)) & (df1['Period'] <= x['Period']) & (df1['group'] == x['group'])].ID.apply(pd.Series).stack())), axis=1)
df1.sort_values(by=['group'], inplace=True)
df1.drop('ID', axis=1, inplace=True)
df1 = df1.reset_index(drop=True)
© www.soinside.com 2019 - 2024. All rights reserved.