我有一个数据框,其中包含几天内以 5 分钟为间隔的值。我想找到每天 09:30-09:55 之间的最高值。此代码为我提供了该时间窗口内所有天的最高值,但我每天都需要它:
import pandas as pd
import yfinance as yf
# Get the data
data = yf.download(tickers="MSFT", period="5d", interval="5m")
df = pd.DataFrame(data)
df = df.between_time('9:30', '09:55')
x = df.High.idxmax(axis=0)
print(x, df.High[x])
我尝试过的:创建一系列独特的交易日,然后循环遍历此列表。这似乎可行,但也似乎非常复杂并且容易出错。有没有更简单的解决方案,甚至不需要循环?
# Build a list of trading days (MON-FRI)
start_date = date(2024, 3, 11)
end_date = date(2024, 3, 16)
ONE_DAY = timedelta(days=1)
monToFri = set([0,1,2,3,4])
def gen_trading_days(start, end):
thisDay = start
while thisDay < end:
dow = thisDay.weekday()
while dow not in monToFri:
thisDay += ONE_DAY
dow = thisDay.weekday()
yield thisDay
thisDay += ONE_DAY
trading_days = []
for i in gen_trading_days(start_date, end_date):
trading_days.append(str(i))
# Loop through list of trading days and filter for day and time window
for i in range(len(trading_days)):
startDate = trading_days[i]
#print(startDate)
dt_startDate = datetime.strptime(startDate, '%Y-%m-%d').date()
dt_range = pd.date_range(start=startDate+" 09:30:00-04:00", end=startDate+" 09:55:00-04:00", freq='5min')
dr = df[df.index.isin(dt_range)]
x = dr.High.idxmax(axis=0)
print(x, df.High[x])
这是我对上述代码的输出:
2024-03-14 09:40:00-04:00 424.95001220703125
--------------
2024-03-11 09:30:00-04:00 404.20001220703125
2024-03-12 09:30:00-04:00 409.677001953125
2024-03-13 09:30:00-04:00 418.17999267578125
2024-03-14 09:40:00-04:00 424.95001220703125
2024-03-15 09:30:00-04:00 422.6000061035156
使用
pd.Grouper
:
idx = df.groupby(pd.Grouper(freq="D"))["High"].idxmax().to_list()
df.loc[idx]
pd.DateTimeIndex.normalize
:
df['High'].groupby(df.index.normalize()).max()
输出:
Datetime
2024-03-11 00:00:00-04:00 404.200012
2024-03-12 00:00:00-04:00 409.677002
2024-03-13 00:00:00-04:00 417.309998
2024-03-14 00:00:00-04:00 424.950012
2024-03-15 00:00:00-04:00 422.600006
Name: High, dtype: float64