动态地将连续的日期列添加到数据框中,其中每个日期开始在连续行中重复 N 次

问题描述 投票:0回答:1

我有以下数据框:

import pandas as pd
import random

data = {
    'Column1': [random.randint(1, 100) for _ in range(9)],
    'Column2': [random.uniform(0, 1) for _ in range(9)],
    'Column3': [chr(random.randint(65, 90)) for _ in range(9)],
    'Column4': [random.choice(['A', 'B', 'C']) for _ in range(9)]
}
df = pd.DataFrame(data)

   Column1   Column2 Column3 Column4
0       87  0.208179       M       C
1       85  0.049071       Q       C
2        4  0.474926       X       C
3       35  0.966357       L       B
4       58  0.295134       C       B
5       23  0.633367       R       B
6       87  0.069583       V       B
7       83  0.427594       N       A
8       16  0.592413       R       C

我想添加一个新的连续 DATE 列,其中的条目从选定的 start_date (= '2022-01-01') 开始,以便每个日期对于整个数据集重复 N (=2) 次。我还想添加另一列名为 SHIFT 的列,其中包含 n (=2) 个替代 selected_values=['Day','Night']。

## desired output for N=2 and start_date = '2022-01-01', and n=2 with selected_values = ['Day','Night']

   Column1   Column2 Column3 Column4   DATE         SHIFT
0       87  0.208179       M       C   2022-01-01    Day
1       85  0.049071       Q       C   2022-01-01    Night
2        4  0.474926       X       C   2022-01-02    Day
3       35  0.966357       L       B   2022-01-02    Night
4       58  0.295134       C       B   2022-01-03    Day
5       23  0.633367       R       B   2022-01-03    Night
6       87  0.069583       V       B   2022-01-04    Day
7       83  0.427594       N       A   2022-01-04    Night
8       16  0.592413       R       C   2022-01-05    Day

- N、n、selected_values 和 start_date 都是动态的。其中 n = selected_values 中的元素数量

为了更清楚地说明,这是不同参数所需的输出:

## desired output for N=2 and start_date = '2022-01-01', and n=3 with selected_values = ['Day','Night','Afternoon']

   Column1   Column2 Column3 Column4   DATE         SHIFT
0       87  0.208179       M       C   2022-01-01    Day
1       85  0.049071       Q       C   2022-01-01    Night
2        4  0.474926       X       C   2022-01-02    Afternoon
3       35  0.966357       L       B   2022-01-02    Day
4       58  0.295134       C       B   2022-01-03    Night
5       23  0.633367       R       B   2022-01-03    Afternoon
6       87  0.069583       V       B   2022-01-04    Day
7       83  0.427594       N       A   2022-01-04    Night
8       16  0.592413       R       C   2022-01-05    Afternoon
python dataframe date repeat
1个回答
0
投票
from datetime import datetime, timedelta

def add_dates_shifts(df, start_date, N, shifts, n):
    dates = [datetime.fromisoformat(start_date) + timedelta(days=i) for i in df.index for _ in range(N)][:len(df)]
    df["DATE"] = dates
    df["SHIFT"] = [shifts[i] for _ in df.index for i in range(n)][:len(df)]
add_dates_shifts(df, "2022-01-01", 2, ["Day", "Night"], 2)

   Column1   Column2 Column3 Column4       DATE  SHIFT
0       48  0.283085       J       C 2022-01-01    Day
1       76  0.896891       J       B 2022-01-01  Night
2       58  0.867856       Z       B 2022-01-02    Day
3       60  0.175652       H       C 2022-01-02  Night
4       50  0.222947       S       A 2022-01-03    Day
5       36  0.753228       A       A 2022-01-03  Night
6       48  0.032319       W       B 2022-01-04    Day
7       10  0.412104       T       A 2022-01-04  Night
8       61  0.266018       Q       A 2022-01-05    Day
© www.soinside.com 2019 - 2024. All rights reserved.