我花了很多时间试图找到这个问题的“好的”解决方案,但我主要遇到的是单个(因此不是多级)索引的解决方案。例如
每周对 python pandas 数据帧进行分组(从周一开始)
我最终决定采用以下对我的情况有效的方法,但我想知道是否有更明智的方法来实现这一目标:
df = df.groupby([df.index.weekday, df.index.time]).sum()
df.index = df.index.set_levels(df.index.levels[0].map({0: "Monday", 1: "Tuesday", 2: "Wednesday", 3: "Thursday", 4: "Friday", 5: "Saturday", 6: "Sunday"}), level=0)
calendar.day_name
:
数据样本
import pandas as pd
from calendar import day_name
index = pd.MultiIndex.from_product([range(7),
pd.date_range("00:00",
periods=2,
freq='60min')]
)
index
MultiIndex([(0, '2024-04-18 00:00:00'),
(0, '2024-04-18 01:00:00'),
(1, '2024-04-18 00:00:00'),
(1, '2024-04-18 01:00:00'),
(2, '2024-04-18 00:00:00'),
(2, '2024-04-18 01:00:00'),
(3, '2024-04-18 00:00:00'),
(3, '2024-04-18 01:00:00'),
(4, '2024-04-18 00:00:00'),
(4, '2024-04-18 01:00:00'),
(5, '2024-04-18 00:00:00'),
(5, '2024-04-18 01:00:00'),
(6, '2024-04-18 00:00:00'),
(6, '2024-04-18 01:00:00')],
)
代码
将
list(day_name)
传递给 pd.MultiIndex.set_levels:
index.set_levels(list(day_name), level=0)
MultiIndex([( 'Monday', '2024-04-18 00:00:00'),
( 'Monday', '2024-04-18 01:00:00'),
( 'Tuesday', '2024-04-18 00:00:00'),
( 'Tuesday', '2024-04-18 01:00:00'),
('Wednesday', '2024-04-18 00:00:00'),
('Wednesday', '2024-04-18 01:00:00'),
( 'Thursday', '2024-04-18 00:00:00'),
( 'Thursday', '2024-04-18 01:00:00'),
( 'Friday', '2024-04-18 00:00:00'),
( 'Friday', '2024-04-18 01:00:00'),
( 'Saturday', '2024-04-18 00:00:00'),
( 'Saturday', '2024-04-18 01:00:00'),
( 'Sunday', '2024-04-18 00:00:00'),
( 'Sunday', '2024-04-18 01:00:00')],
)