如何获取 pandas 中数据连续值组的开始和结束日期时间索引,包括重复值?

问题描述 投票:0回答:1

有很多基于数字索引的答案,但我正在寻找一种适用于 DateTimeIndex 的解决方案,但我真的被困在这里了。我用数字索引找到的最接近的答案是 this one 但不适用于我的示例。

我想让组开始和结束为

DateTime
对于 DataFrame 列中的
n
连续值组。

样本数据:

import pandas as pd


index = pd.date_range(
    start=pd.Timestamp("2023-03-20 12:00:00+0000", tz="UTC"),
    end=pd.Timestamp("2023-03-20 15:00:00+0000", tz="UTC"),
    freq="15Min",
)
data = {
    "values_including_constant_groups": [
        2.0,
        1.0,
        1.0,
        3.0,
        3.0,
        3.0,
        4.0,
        4.0,
        4.0,
        2.0,
        3.0,
        3.0,
        1.0,
    ],
}
df = pd.DataFrame(
    index=index,
    data=data,
)

print(df)

产量:

                        values_including_constant_groups
2023-03-20 12:00:00+00:00                               2.0
2023-03-20 12:15:00+00:00                               1.0
2023-03-20 12:30:00+00:00                               1.0
2023-03-20 12:45:00+00:00                               3.0
2023-03-20 13:00:00+00:00                               3.0
2023-03-20 13:15:00+00:00                               3.0
2023-03-20 13:30:00+00:00                               4.0
2023-03-20 13:45:00+00:00                               4.0
2023-03-20 14:00:00+00:00                               4.0
2023-03-20 14:15:00+00:00                               2.0
2023-03-20 14:30:00+00:00                               3.0
2023-03-20 14:45:00+00:00                               3.0
2023-03-20 15:00:00+00:00                               1.0

期望的输出(我在这里会很灵活,但这是我的第一个想法):

                        values_including_constant_groups   group_start      group_end
2023-03-20 12:00:00+00:00                               2.0   NaN              NaN
2023-03-20 12:15:00+00:00                               1.0   True             False
2023-03-20 12:30:00+00:00                               1.0   False            True
2023-03-20 12:45:00+00:00                               3.0   True             False
2023-03-20 13:00:00+00:00                               3.0   False            False
2023-03-20 13:15:00+00:00                               3.0   False            True
2023-03-20 13:30:00+00:00                               4.0   True             False
2023-03-20 13:45:00+00:00                               4.0   False            False
2023-03-20 14:00:00+00:00                               4.0   False            True
2023-03-20 14:15:00+00:00                               2.0   NaN              NaN
2023-03-20 14:30:00+00:00                               3.0   True             False
2023-03-20 14:45:00+00:00                               3.0   False            True
2023-03-20 15:00:00+00:00                               1.0   NaN              NaN

所以这里只应考虑

n>=2
的组,并排除“单个”值。此外,应包括重复组。

欢迎任何提示!

python pandas
1个回答
0
投票

代码

c = 'values_including_constant_groups'

# Compare current with previous and previous with current row
# to flag the rows corresponding to group start and group end
s, e = df[c] != df[c].shift(), df[c] != df[c].shift(-1)

# mask the flags where both group_start and group_end
# is True on the same row, i.e where n == 1
df['group_start'], df['group_end'] = s.mask(s & e), e.mask(s & e)

结果

                           values_including_constant_groups group_start group_end
2023-03-20 12:00:00+00:00                               2.0         NaN       NaN
2023-03-20 12:15:00+00:00                               1.0        True     False
2023-03-20 12:30:00+00:00                               1.0       False      True
2023-03-20 12:45:00+00:00                               3.0        True     False
2023-03-20 13:00:00+00:00                               3.0       False     False
2023-03-20 13:15:00+00:00                               3.0       False      True
2023-03-20 13:30:00+00:00                               4.0        True     False
2023-03-20 13:45:00+00:00                               4.0       False     False
2023-03-20 14:00:00+00:00                               4.0       False      True
2023-03-20 14:15:00+00:00                               2.0         NaN       NaN
2023-03-20 14:30:00+00:00                               3.0        True     False
2023-03-20 14:45:00+00:00                               3.0       False      True
2023-03-20 15:00:00+00:00                               1.0         NaN       NaN
© www.soinside.com 2019 - 2024. All rights reserved.