使用Pandas重新采样的时间比原始时间范围长

问题描述 投票:1回答:2

我有以下每日定价数据:

2017-06-01  15.00
2017-06-02  20.00

我想将它重新采样为每小时超过35小时的价格。因此,第一个24小时的每个样本的值为15.00,从24小时到35小时的价格将是20.00。

2017-06-01 00:00    15.00
2017-06-01 01:00    15.00
2017-06-01 02:00    15.00
…
2017-06-01 23:00    15.00
2017-06-02 00:00    20.00
2017-06-02 01:00    20.00
2017-06-02 02:00    20.00
…
2017-06-02 10:00    20.00

我尝试使用resample('3600S')。pad()但它不起作用。是否可以单独创建新数据范围并将其用作重采样功能的输入? resample()似乎没有在这里完成工作。

python pandas sampling
2个回答
0
投票

您可以按小时频率和重新索引创建自定义日期范围

df.index = pd.to_datetime(df.index)
rng=pd.date_range(start=df.index.min(), periods=35, freq='H')
df.reindex(rng).ffill()

                    val
2017-06-01 00:00:00 15.0
2017-06-01 01:00:00 15.0
2017-06-01 02:00:00 15.0
2017-06-01 03:00:00 15.0
2017-06-01 04:00:00 15.0
2017-06-01 05:00:00 15.0
2017-06-01 06:00:00 15.0
2017-06-01 07:00:00 15.0
2017-06-01 08:00:00 15.0
2017-06-01 09:00:00 15.0
2017-06-01 10:00:00 15.0
2017-06-01 11:00:00 15.0
2017-06-01 12:00:00 15.0
2017-06-01 13:00:00 15.0
2017-06-01 14:00:00 15.0
2017-06-01 15:00:00 15.0
2017-06-01 16:00:00 15.0
2017-06-01 17:00:00 15.0
2017-06-01 18:00:00 15.0
2017-06-01 19:00:00 15.0
2017-06-01 20:00:00 15.0
2017-06-01 21:00:00 15.0
2017-06-01 22:00:00 15.0
2017-06-01 23:00:00 15.0
2017-06-02 00:00:00 20.0
2017-06-02 01:00:00 20.0
2017-06-02 02:00:00 20.0
2017-06-02 03:00:00 20.0
2017-06-02 04:00:00 20.0
2017-06-02 05:00:00 20.0
2017-06-02 06:00:00 20.0
2017-06-02 07:00:00 20.0
2017-06-02 08:00:00 20.0
2017-06-02 09:00:00 20.0
2017-06-02 10:00:00 20.0

0
投票

另一种方法是(a)resample without aggregation,(b)计算row-wise hourly difference然后(c)使用np.whereconditionally set the value column

样本数据

d = {'date':['2017-06-01','2017-06-02', '2017-06-03'], 'value':[15,20,30]}
df = pd.DataFrame.from_dict(d)
print(df)

         date  value
0  2017-06-01     15
1  2017-06-02     20
2  2017-06-03     30

from numpy import where, timedelta64
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').asfreq("H").iloc[:35,:]
# Get time difference in hours, relative to 1st row
df['hours'] = ((df.index - df.index[0])/timedelta64(1, 'h')).astype(int)
# Conditionally set 'value' column, using time difference
df['value'] = where(df['hours']<35, 15, 20)
print(df)

产量

                     value  hours
date                             
2017-06-01 00:00:00     15      0
2017-06-01 01:00:00     15      1
2017-06-01 02:00:00     15      2
2017-06-01 03:00:00     15      3
2017-06-01 04:00:00     15      4
2017-06-01 05:00:00     15      5
2017-06-01 06:00:00     15      6
2017-06-01 07:00:00     15      7
2017-06-01 08:00:00     15      8
2017-06-01 09:00:00     15      9
2017-06-01 10:00:00     15     10
2017-06-01 11:00:00     15     11
2017-06-01 12:00:00     15     12
2017-06-01 13:00:00     15     13
2017-06-01 14:00:00     15     14
2017-06-01 15:00:00     15     15
2017-06-01 16:00:00     15     16
2017-06-01 17:00:00     15     17
2017-06-01 18:00:00     15     18
2017-06-01 19:00:00     15     19
2017-06-01 20:00:00     15     20
2017-06-01 21:00:00     15     21
2017-06-01 22:00:00     15     22
2017-06-01 23:00:00     15     23
2017-06-02 00:00:00     15     24
2017-06-02 01:00:00     15     25
2017-06-02 02:00:00     15     26
2017-06-02 03:00:00     15     27
2017-06-02 04:00:00     15     28
2017-06-02 05:00:00     15     29
2017-06-02 06:00:00     15     30
2017-06-02 07:00:00     15     31
2017-06-02 08:00:00     15     32
2017-06-02 09:00:00     15     33
2017-06-02 10:00:00     15     34
2017-06-02 11:00:00     20     35

编辑

代替

df = df.set_index('date').asfreq("H").iloc[:35,:]

你也可以用

df = df.set_index('date').asfreq("H")
df = df.loc[pd.date_range(start=df.index[0], periods=35, freq='H'),['value']]
© www.soinside.com 2019 - 2024. All rights reserved.