计算并分解极坐标中两列之间的日期范围

问题描述 投票:0回答:1

我需要计算两个日期列之间的所有月份结束并分解结果列表。

import polars as pl
from datetime import datetime

df = pl.DataFrame(
    {
        "id": ["A", "A", "A", "B", "B"],
        "value": ["1", "2", "3", "4", "5"],
        "valid_from": [
            datetime(2020, 1, 1),
            datetime(2021, 1, 1),
            datetime(2022, 1, 1),
            datetime(2020, 1, 1),
            datetime(2021, 1, 1),
        ],
        "valid_to": [
            datetime(2020, 12, 31),
            datetime(2021, 12, 31),
            datetime(2022, 12, 31),
            datetime(2020, 12, 31),
            datetime(2021, 12, 31),
        ],
    }
)

def __month_range(dict):
    start,end = dict.values()
    return pl.date_range(start, end, "1mo_saturating", eager=True).dt.month_end()

df.with_columns(
    pl.struct(["valid_from","valid_to"]).apply(__month_range).alias("test")
).explode("test")

这是这样做的方法吗?或者是否有更简单/更快的方法而不使用 struct ?

date datetime date-range python-polars
1个回答
6
投票

[更新]:

pl.date_ranges()
(复数)已添加到
0.18.9
中,直接执行此操作。

>>> pl.date_ranges("valid_from", "valid_to")
<polars.expr.expr.Expr at 0x135015b70>
df.with_columns(date = pl.date_ranges("valid_from", "valid_to"))
# shape: (5, 5)
# ┌─────┬───────┬─────────────────────┬─────────────────────┬───────────────────────────────────┐
# │ id  ┆ value ┆ valid_from          ┆ valid_to            ┆ date                              │
# │ --- ┆ ---   ┆ ---                 ┆ ---                 ┆ ---                               │
# │ str ┆ str   ┆ datetime[μs]        ┆ datetime[μs]        ┆ list[datetime[μs]]                │
# ╞═════╪═══════╪═════════════════════╪═════════════════════╪═══════════════════════════════════╡
# │ A   ┆ 1     ┆ 2020-01-01 00:00:00 ┆ 2020-12-31 00:00:00 ┆ [2020-01-01 00:00:00, 2020-01-02… │
# │ A   ┆ 2     ┆ 2021-01-01 00:00:00 ┆ 2021-12-31 00:00:00 ┆ [2021-01-01 00:00:00, 2021-01-02… │
# │ A   ┆ 3     ┆ 2022-01-01 00:00:00 ┆ 2022-12-31 00:00:00 ┆ [2022-01-01 00:00:00, 2022-01-02… │
# │ B   ┆ 4     ┆ 2020-01-01 00:00:00 ┆ 2020-12-31 00:00:00 ┆ [2020-01-01 00:00:00, 2020-01-02… │
# │ B   ┆ 5     ┆ 2021-01-01 00:00:00 ┆ 2021-12-31 00:00:00 ┆ [2021-01-01 00:00:00, 2021-01-02… │
# └─────┴───────┴─────────────────────┴─────────────────────┴───────────────────────────────────┘
(df.with_columns(date = pl.date_ranges("valid_from", "valid_to"))
   .explode("date")
   .with_columns(month_end = pl.col("date").dt.month_end())
)
# shape: (1_827, 6)
# ┌─────┬───────┬─────────────────────┬─────────────────────┬─────────────────────┬─────────────────────┐
# │ id  ┆ value ┆ valid_from          ┆ valid_to            ┆ date                ┆ month_end           │
# │ --- ┆ ---   ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---                 │
# │ str ┆ str   ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        │
# ╞═════╪═══════╪═════════════════════╪═════════════════════╪═════════════════════╪═════════════════════╡
# │ A   ┆ 1     ┆ 2020-01-01 00:00:00 ┆ 2020-12-31 00:00:00 ┆ 2020-01-01 00:00:00 ┆ 2020-01-31 00:00:00 │
# │ A   ┆ 1     ┆ 2020-01-01 00:00:00 ┆ 2020-12-31 00:00:00 ┆ 2020-01-02 00:00:00 ┆ 2020-01-31 00:00:00 │
# │ A   ┆ 1     ┆ 2020-01-01 00:00:00 ┆ 2020-12-31 00:00:00 ┆ 2020-01-03 00:00:00 ┆ 2020-01-31 00:00:00 │
# │ A   ┆ 1     ┆ 2020-01-01 00:00:00 ┆ 2020-12-31 00:00:00 ┆ 2020-01-04 00:00:00 ┆ 2020-01-31 00:00:00 │
# │ …   ┆ …     ┆ …                   ┆ …                   ┆ …                   ┆ …                   │
# │ B   ┆ 5     ┆ 2021-01-01 00:00:00 ┆ 2021-12-31 00:00:00 ┆ 2021-12-28 00:00:00 ┆ 2021-12-31 00:00:00 │
# │ B   ┆ 5     ┆ 2021-01-01 00:00:00 ┆ 2021-12-31 00:00:00 ┆ 2021-12-29 00:00:00 ┆ 2021-12-31 00:00:00 │
# │ B   ┆ 5     ┆ 2021-01-01 00:00:00 ┆ 2021-12-31 00:00:00 ┆ 2021-12-30 00:00:00 ┆ 2021-12-31 00:00:00 │
# │ B   ┆ 5     ┆ 2021-01-01 00:00:00 ┆ 2021-12-31 00:00:00 ┆ 2021-12-31 00:00:00 ┆ 2021-12-31 00:00:00 │
# └─────┴───────┴─────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┘
© www.soinside.com 2019 - 2024. All rights reserved.