为什么在 pandas 数据框中将时间转换为索引

问题描述 投票:0回答:1

当使用 pandas 数据框时,其中一列是时间,为什么我们通常将时间列转换为索引?有什么好处?

我可以实施索引,但不确定收益是什么

pandas indexing
1个回答
0
投票

假设你有这个数据框:

import pandas as pd
import numpy as np

dti = pd.date_range('2024-1-27', periods=24, freq='H')
temp = np.random.randint(10, 25, len(dti))
df = pd.DataFrame({'DateTime': dti, 'Temperature': temp})
print(df)

              DateTime  Temperature
0  2024-01-27 00:00:00           10
1  2024-01-27 01:00:00           21
2  2024-01-27 02:00:00           24
3  2024-01-27 03:00:00           18
4  2024-01-27 04:00:00           12
5  2024-01-27 05:00:00           16
6  2024-01-27 06:00:00           16
7  2024-01-27 07:00:00           15
8  2024-01-27 08:00:00           10
9  2024-01-27 09:00:00           20
10 2024-01-27 10:00:00           12
11 2024-01-27 11:00:00           18
12 2024-01-27 12:00:00           24
13 2024-01-27 13:00:00           17
14 2024-01-27 14:00:00           21
15 2024-01-27 15:00:00           15
16 2024-01-27 16:00:00           20
17 2024-01-27 17:00:00           16
18 2024-01-27 18:00:00           14
19 2024-01-27 19:00:00           12
20 2024-01-27 20:00:00           24
21 2024-01-27 21:00:00           24
22 2024-01-27 22:00:00           16
23 2024-01-27 23:00:00           16

如果您想选择下午 13 点到 17 点之间的行,您应该:

>>> df[df['DateTime'].between('2024-01-27 13:00:00', '2024-01-27 17:00:00')]
              DateTime  Temperature
13 2024-01-27 13:00:00           17
14 2024-01-27 14:00:00           21
15 2024-01-27 15:00:00           15
16 2024-01-27 16:00:00           20
17 2024-01-27 17:00:00           16

假设您想对值重新采样(3H 平均值):

>>> df.resample('3H', on='DateTime').mean().reset_index()
             DateTime  Temperature
0 2024-01-27 00:00:00    18.333333
1 2024-01-27 03:00:00    15.333333
2 2024-01-27 06:00:00    13.666667
3 2024-01-27 09:00:00    16.666667
4 2024-01-27 12:00:00    20.666667
5 2024-01-27 15:00:00    17.000000
6 2024-01-27 18:00:00    16.666667
7 2024-01-27 21:00:00    18.666667

现在我们将

DateTime
列定义为索引:

>>> df = df.set_index('DateTime')
                     Temperature
DateTime                        
2024-01-27 00:00:00           10
2024-01-27 01:00:00           21
2024-01-27 02:00:00           24
2024-01-27 03:00:00           18
2024-01-27 04:00:00           12
2024-01-27 05:00:00           16
2024-01-27 06:00:00           16
2024-01-27 07:00:00           15
2024-01-27 08:00:00           10
2024-01-27 09:00:00           20
2024-01-27 10:00:00           12
2024-01-27 11:00:00           18
2024-01-27 12:00:00           24
2024-01-27 13:00:00           17
2024-01-27 14:00:00           21
2024-01-27 15:00:00           15
2024-01-27 16:00:00           20
2024-01-27 17:00:00           16
2024-01-27 18:00:00           14
2024-01-27 19:00:00           12
2024-01-27 20:00:00           24
2024-01-27 21:00:00           24
2024-01-27 22:00:00           16
2024-01-27 23:00:00           16

并进行相同的操作:

>>> df['2024-01-27 13:00:00':'2024-01-27 17:00:00']
                     Temperature
DateTime                        
2024-01-27 13:00:00           17
2024-01-27 14:00:00           21
2024-01-27 15:00:00           15
2024-01-27 16:00:00           20
2024-01-27 17:00:00           16

>>> df.resample('3H').mean()
                     Temperature
DateTime                        
2024-01-27 00:00:00    18.333333
2024-01-27 03:00:00    15.333333
2024-01-27 06:00:00    13.666667
2024-01-27 09:00:00    16.666667
2024-01-27 12:00:00    20.666667
2024-01-27 15:00:00    17.000000
2024-01-27 18:00:00    16.666667
2024-01-27 21:00:00    18.666667

这显然是更简洁的写法。

© www.soinside.com 2019 - 2024. All rights reserved.