将长写日期转换为Python中的标准格式?

问题描述 投票:0回答:1

我有一系列文本块,其中包含写为“2021 年 9 月第一个星期三”或“2022 年 7 月第三个星期一”等的日期。我不确定提取文本和重新格式化的最佳方法它作为标准的“月日,年”格式。我尝试过使用启用模糊匹配的日期查找器库,但“第一个星期二”和其他库都失败了,我相信因为它不是正常的日期格式。任何想法将不胜感激,谢谢大家!

python datetime string-formatting datefinder
1个回答
0
投票

假设文本中的所有日期均为

The cardinal day_of_week of Month, Year
格式(您必须在第二个日期中将 in 替换为 of):

import calendar
import re

text = [
    "The first Wednesday of September, 2021",
    "The third Monday of July, 2022",
    # more dates
]

pattern = r"The (\w+) (\w+) of (\w+), (\d{4})"

cardinal = {
    "first": 1,
    "second": 2,
    "third": 3,
    "fourth": 4,
    "fifth": 5
}


def find_nth_day_of_week(year_str, month_name, day_of_week, n_str):
    year = int(year_str)

    month = list(calendar.month_name).index(month_name.capitalize())
    if month == 0:
        return None

    n = cardinal.get(n_str.lower())
    if n is None:
        return None

    cal = calendar.monthcalendar(year, month)

    day_index = list(calendar.day_name).index(day_of_week.capitalize())

    nth_occurrence = [week[day_index] for week in cal if week[day_index] != 0]
    if n > len(nth_occurrence):
        return None

    day = nth_occurrence[n - 1]
    date = f"{calendar.month_abbr[month]} {day}, {year}"
    return date


def parse_text(text):
    match = re.match(pattern, text)
    if match:
        cardinal, day_of_week, month, year = match.groups()
        return find_nth_day_of_week(year, month, day_of_week, cardinal)
    return None


dates = [parse_text(block) for block in text]

for i, date in enumerate(dates):
    print(f"Date {i + 1}: {date}")
© www.soinside.com 2019 - 2024. All rights reserved.