更新:错误 - >“无法处理非唯一的多索引!”
运行下面的代码后,我在Python中获得以下输出数据帧:
df = df_EVENT5_18[['FLEET', 'SUBFLEET', 'AIRCRAFT', 'DTIN']]
df = df.sort_values(['FLEET', 'SUBFLEET', 'AIRCRAFT', 'DTIN'])
df.set_index(['FLEET', 'SUBFLEET', 'AIRCRAFT'], inplace=True)
# df = df.reset_index()
df['DTIN'] = pd.to_datetime(df['DTIN'])
但它在最后一行代码中出错了:
df_EVENT5_19 = df.assign(output = df.groupby(['FLEET', 'SUBFLEET', 'AIRCRAFT']).DTIN.apply(lambda x: x.diff()))
这是错误:“无法处理非唯一的多索引!”
以下是我正在使用的示例表:
列UI_A,UI_B和UI_C一起形成唯一标识符。
我想为每一行和每个唯一标识符计算自上次日期以来的天数。基本上,如果您的唯一标识符相同,那么您需要引用您上方一行的日期。
这个逻辑有点难以解释,所以我在下面包含了我想要的输出表。我想创建“自上次日期以来的天数”列
如果您正在使用pandas,则可以使用assign,然后使用groupby
import pandas as pd
data = {
'UI_A':['319','319','319','319','319','319','319','319','319','319'],
'UI_B': ['131','131','131','131','131','131','131','131','131','131'],
'UI_C': ['00319','00319','00319','04001','04001','04001','04002','04002','04002','04002'],
'DATE' : ['2012-12-30','2013-02-05','2013-02-11','2009-10-25','2010-09-08','2011-01-16','2009-12-02','2010-09-27','2011-01-06','2011-02-09']
}
df = pd.DataFrame(data)
df.set_index(['UI_A','UI_B','UI_C'],inplace=True)
df['DATE'] = pd.to_datetime(df['DATE'])
df = df.assign(output=df.groupby(['UI_A','UI_B','UI_C']).DATE.apply(lambda x: x.diff()))
日期:
DATE output
UI_A UI_B UI_C
319 131 00319 2012-12-30 NaT
00319 2013-02-05 37 days
00319 2013-02-11 6 days
04001 2009-10-25 NaT
04001 2010-09-08 318 days
04001 2011-01-16 130 days
04002 2009-12-02 NaT
04002 2010-09-27 299 days
04002 2011-01-06 101 days
04002 2011-02-09 34 days