我得到了带有此索引的 df,在某个时候日期从 2024-03-03 更改为 2023-02-25。 我想用正确部分的逻辑扩展替换错误部分(2023...)
样品:
2024-02-23 -5.60000
2024-02-24 -13.00000
2024-02-25 -27.20000
2024-02-26 -4.20000
2024-02-27 -11.20000
2024-02-28 -14.73625
2024-02-29 -19.37000
2024-03-01 -16.89000
2024-03-02 -5.97000
2024-03-03 -1.30000
2023-02-25 -35.40000
2023-02-26 -28.70000
2023-02-27 -26.40000
2023-02-28 -15.40000
2023-03-01 -14.10000
2023-03-02 -11.20000
2023-03-03 -21.00000
2023-03-04 -17.00000
2023-03-05 -17.60000
2023-03-06 -6.70000
如何让它变得干净并且Pythonic?
要纠正部分日期错误的 DataFrame 索引(例如,跳回一年),您可以识别不连续点,然后通过添加必要的时间增量来调整不正确的日期:
import pandas as pd
dates = ['2024-02-23', '2024-02-24', '2024-02-25', '2024-02-26', '2024-02-27',
'2024-02-28', '2024-02-29', '2024-03-01', '2024-03-02', '2024-03-03',
'2023-02-25', '2023-02-26', '2023-02-27', '2023-02-28', '2023-03-01',
'2023-03-02', '2023-03-03', '2023-03-04', '2023-03-05', '2023-03-06']
values = [-5.6, -13.0, -27.2, -4.2, -11.2, -14.73625, -19.37, -16.89, -5.97,
-1.3, -35.4, -28.7, -26.4, -15.4, -14.1, -11.2, -21.0, -17.0, -17.6, -6.7]
df = pd.DataFrame(values, index=pd.to_datetime(dates), columns=['Values'])
# Find where the date decreases from one row to the next
discontinuity_point = df.index[df.index.to_series().diff() < pd.Timedelta(days=0)].min()
# Add one year to all dates that are less than the discontinuity point
df.index = df.index.map(lambda x: x if x >= discontinuity_point else x + pd.DateOffset(years=1))
print(df)