我可以将特定日期列转换为月初,例如使用以下代码将 24-05-2024 更改为 01-05-24:
df['date_sm'] = pd.to_datetime(df['date']) - MonthBegin(1)
我想要一个函数(def),我只需要指定我想要更改的列名称,然后能够应用该函数,而不是为每个列执行代码。有办法做到这一点吗?
如果你想要一个可以改变 DataFrame 的函数:
def col_to_monthbegin(df, col):
df[f'{col}_sm'] = pd.to_datetime(df[col]) - pd.offsets.MonthBegin(1)
示例:
df = pd.DataFrame({'A': 1, 'date': pd.date_range('2024-01-01', periods=5)})
col_to_monthbegin(df, 'date')
print(df)
# A date date_sm
# 0 1 2024-01-01 2023-12-01
# 1 1 2024-01-02 2024-01-01
# 2 1 2024-01-03 2024-01-01
# 3 1 2024-01-04 2024-01-01
# 4 1 2024-01-05 2024-01-01
请注意,该月的第一天移至上个月,以避免您可以使用:
def col_to_monthbegin(df, col):
df[f'{col}_sm'] = pd.to_datetime(df[col]) + pd.DateOffset(day=1)
# or
def col_to_monthbegin(df, col):
df[f'{col}_sm'] = (pd.to_datetime(df[col])
+ pd.offsets.MonthBegin(1)
- pd.offsets.MonthBegin(1)
)
# example
# A date date_sm
# 0 1 2024-01-01 2024-01-01
# 1 1 2024-01-02 2024-01-01
# 2 1 2024-01-03 2024-01-01
# 3 1 2024-01-04 2024-01-01
# 4 1 2024-01-05 2024-01-01
def col_to_monthbegin(df, cols):
tmp = df[cols].apply(lambda x: pd.to_datetime(x)
+ pd.offsets.MonthBegin(1)
- pd.offsets.MonthBegin(1))
df[tmp.columns.astype(str)+'_sm'] = tmp
df = pd.DataFrame({'A': 1, 'date': pd.date_range('2024-01-01', periods=5),
'date2': pd.date_range('2025-01-01', periods=5)})
col_to_monthbegin(df, ['date', 'date2'])
print(df)
# A date date2 date_sm date2_sm
# 0 1 2024-01-01 2025-01-01 2024-01-01 2025-01-01
# 1 1 2024-01-02 2025-01-02 2024-01-01 2025-01-01
# 2 1 2024-01-03 2025-01-03 2024-01-01 2025-01-01
# 3 1 2024-01-04 2025-01-04 2024-01-01 2025-01-01
# 4 1 2024-01-05 2025-01-05 2024-01-01 2025-01-01
不太确定这是否是您想要的。在这里,您可以传递要更改的数据帧和要更改的列列表。这将更改传递的列,不会创建新的列。
import pandas as pd
from pandas.tseries.offsets import MonthBegin
def convert_to_start_of_month(df, date_columns):
for col in date_columns:
df[col] = pd.to_datetime(df[col]) - MonthBegin(1)
return df
data = {
'date1': ['2024-05-24', '2024-06-15', '2024-07-30'],
'date2': ['2023-12-10', '2023-11-22', '2023-10-05'],
'other_column': [1, 2, 3]
}
df = pd.DataFrame(data)
date_columns = ['date1', 'date2']
df = convert_to_start_of_month(df, date_columns)
print(df)
# output:
#
# date1 date2 other_column
# 0 2024-05-01 2023-12-01 1
# 1 2024-06-01 2023-11-01 2
# 2 2024-07-01 2023-10-01 3