如何为两列之间的所有日期添加行?

问题描述 投票:0回答:2
import pandas as pd

mydata = [{'ID' : '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016'},
          {'ID' : '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016'}]

mydata2 = [{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '10/10/2016'},
           {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '11/10/2016'},
           {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '12/10/2016'},
           {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '13/10/2016'},
           {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '14/10/2016'},
           {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '15/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '10/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '11/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '12/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '13/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '14/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '15/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '16/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '17/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '18/10/2016'},]

df = pd.DataFrame(mydata)
df2 = pd.DataFrame(mydata2)

我找不到如何将“df”更改为“df2”的答案。可能我表达得不太对。

我想获取“进入日期”、“退出日期”两列中日期之间的所有日期,并为每一列创建一行,在新列“日期”中为每行输入相应的日期。

任何帮助将不胜感激。

python datetime pandas resampling melt
2个回答
10
投票

您可以使用

melt
进行重塑,
set_index
并删除列
variable

#convert columns to datetime
df['Entry Date'] = pd.to_datetime(df['Entry Date'])
df['Exit Date'] = pd.to_datetime(df['Exit Date'])

df2 = pd.melt(df, id_vars='ID', value_name='Date')
df2.Date = pd.to_datetime(df2.Date)
df2.set_index('Date', inplace=True)
df2.drop('variable', axis=1, inplace=True)
print (df2)
            ID
Date          
2016-10-10  10
2016-10-10  20
2016-10-15  10
2016-10-18  20

然后

groupby
resample
ffill
缺失值:

df3 = df2.groupby('ID').resample('D').ffill().reset_index(level=0, drop=True).reset_index()
print (df3)
         Date  ID
0  2016-10-10  10
1  2016-10-11  10
2  2016-10-12  10
3  2016-10-13  10
4  2016-10-14  10
5  2016-10-15  10
6  2016-10-10  20
7  2016-10-11  20
8  2016-10-12  20
9  2016-10-13  20
10 2016-10-14  20
11 2016-10-15  20
12 2016-10-16  20
13 2016-10-17  20
14 2016-10-18  20

最后

merge
原版
DataFrame

print (pd.merge(df, df3))
   Entry Date  Exit Date  ID       Date
0  2016-10-10 2016-10-15  10 2016-10-10
1  2016-10-10 2016-10-15  10 2016-10-11
2  2016-10-10 2016-10-15  10 2016-10-12
3  2016-10-10 2016-10-15  10 2016-10-13
4  2016-10-10 2016-10-15  10 2016-10-14
5  2016-10-10 2016-10-15  10 2016-10-15
6  2016-10-10 2016-10-18  20 2016-10-10
7  2016-10-10 2016-10-18  20 2016-10-11
8  2016-10-10 2016-10-18  20 2016-10-12
9  2016-10-10 2016-10-18  20 2016-10-13
10 2016-10-10 2016-10-18  20 2016-10-14
11 2016-10-10 2016-10-18  20 2016-10-15
12 2016-10-10 2016-10-18  20 2016-10-16
13 2016-10-10 2016-10-18  20 2016-10-17
14 2016-10-10 2016-10-18  20 2016-10-18

3
投票

使用较新版本的 Pandas (> 1.1),您可以使用

explode
函数生成日期:

df['Entry Date'] = pd.to_datetime(df['Entry Date'])
df['Exit Date'] = pd.to_datetime(df['Exit Date'])

(df.assign(Date = [pd.date_range(start, end) 
                   for start, end 
                   in zip(df['Entry Date'], df['Exit Date'])]
          )
   .explode('Date', ignore_index = True)
)
    ID Entry Date  Exit Date       Date
0   10 2016-10-10 2016-10-15 2016-10-10
1   10 2016-10-10 2016-10-15 2016-10-11
2   10 2016-10-10 2016-10-15 2016-10-12
3   10 2016-10-10 2016-10-15 2016-10-13
4   10 2016-10-10 2016-10-15 2016-10-14
5   10 2016-10-10 2016-10-15 2016-10-15
6   20 2016-10-10 2016-10-18 2016-10-10
7   20 2016-10-10 2016-10-18 2016-10-11
8   20 2016-10-10 2016-10-18 2016-10-12
9   20 2016-10-10 2016-10-18 2016-10-13
10  20 2016-10-10 2016-10-18 2016-10-14
11  20 2016-10-10 2016-10-18 2016-10-15
12  20 2016-10-10 2016-10-18 2016-10-16
13  20 2016-10-10 2016-10-18 2016-10-17
14  20 2016-10-10 2016-10-18 2016-10-18

更快的选择是使用 conditional_join:

# pip install pyjanitor
import pandas as pd
import janitor

df['Entry Date'] = pd.to_datetime(df['Event Date'])
df['Exit Date'] = pd.to_datetime(df['Exit Date'],format='%Y-%m-%d')
dates = pd.date_range(df['Entry Date'].min(), df['Exit Date'].max())
dates = pd.Series(dates,name='Date')

(df
.conditional_join(
    dates, 
    ('Entry Date', 'Date', '<='), 
    ('Exit Date', 'Date', '>='))
)
    ID Entry Date  Exit Date       Date
0   10 2016-10-10 2016-10-15 2016-10-10
1   10 2016-10-10 2016-10-15 2016-10-11
2   10 2016-10-10 2016-10-15 2016-10-12
3   10 2016-10-10 2016-10-15 2016-10-13
4   10 2016-10-10 2016-10-15 2016-10-14
5   10 2016-10-10 2016-10-15 2016-10-15
6   20 2016-10-10 2016-10-18 2016-10-10
7   20 2016-10-10 2016-10-18 2016-10-11
8   20 2016-10-10 2016-10-18 2016-10-12
9   20 2016-10-10 2016-10-18 2016-10-13
10  20 2016-10-10 2016-10-18 2016-10-14
11  20 2016-10-10 2016-10-18 2016-10-15
12  20 2016-10-10 2016-10-18 2016-10-16
13  20 2016-10-10 2016-10-18 2016-10-17
14  20 2016-10-10 2016-10-18 2016-10-18
© www.soinside.com 2019 - 2024. All rights reserved.