我有以下 df:
Start_Date End_Date Relevant Volume
2024-10-01 2024-12-31 False 0.000000
2025-01-01 2025-03-31 True 0.097989
2025-04-01 2025-06-30 True -0.014449
2025-01-01 2025-12-31 True 0.195327
2026-01-01 2026-12-31 False 0.000000
我需要一个在第一个/最后一个日期开始/结束的每小时索引,其中 Relevant == True。我这样做如下:
relevant_df = df[df['Relevant']]
earliest_start = relevant_df['Start_Date'].min()
latest_end = relevant_df['End_Date'].max()
# Create DateTime index
date_range = pd.date_range(start=earliest_start, end=latest_end, freq='H')
aggregated_volumes = pd.Series(index=date_range, dtype=float)
现在,我如何获取每个周期的交易量并将它们加在一起,以便在本例中,2025 年前三个月,每小时的交易量等于 0.097989 + 0.195327 第二季度 -0.014449 + 0.195327 等..
由于您的间隔重叠,我相信没有直接的方法来索引您的值。
但是,您可以创建一个 NxM numpy 数组(N:行数,M:True 行数)和总和:
# ensure datetime
relevant_df[['Start_Date', 'End_Date']] = relevant_df[['Start_Date', 'End_Date']].apply(pd.to_datetime)
# compute a mask of the values between start/end
idx = date_range.to_numpy()[:, None]
m1 = idx>=relevant_df['Start_Date'].to_numpy()
m2 = idx<relevant_df['End_Date'].to_numpy()
# broadcast the values, sum, convert to Series
out = pd.Series(np.nansum(np.where(m1&m2, relevant_df['Volume'].to_numpy(), np.nan), axis=1),
index=date_range)
输出:
2025-01-01 00:00:00 0.293316
2025-01-01 01:00:00 0.293316
2025-01-01 02:00:00 0.293316
2025-01-01 03:00:00 0.293316
2025-01-01 04:00:00 0.293316
...
2025-12-30 20:00:00 0.195327
2025-12-30 21:00:00 0.195327
2025-12-30 22:00:00 0.195327
2025-12-30 23:00:00 0.195327
2025-12-31 00:00:00 0.000000
Freq: h, Length: 8737, dtype: float64