从 spaCy 中的句子中提取日期

Question

我有一个像这样的字符串：

"The dates are from 30 June 2019 to 1 January 2022 inclusive"

我想使用 spaCy 从此字符串中提取日期。

这是迄今为止我的功能：

def extract_dates_with_year(text):
    doc = nlp(text)
    dates_with_year = []
    for ent in doc.ents:
        if ent.label_ == "DATE":
            dates_with_year.append(ent.text)
    return dates_with_year

这将返回以下输出：

['30 June 2019 to 1 January 2022']

但是，我想要这样的输出：

['30 June 2019', '1 January 2022']

Answer 1

问题在于

"to"

被视为日期的一部分。因此，当您执行

for ent in doc.ents

时，您的循环仅进行一次迭代，因为

"30 June 2019 to 1 January 2022"

被视为一个实体。

由于您不希望出现这种行为，您可以修改函数以在

"to"

:

上进行拆分

def extract_dates_with_year(text):
    doc = nlp(text)
    dates_with_year = []
    for ent in doc.ents:
        if ent.label_ == "DATE":
            for ent_txt in ent.text.split("to"):
                dates_with_year.append(ent_txt.strip())
    return dates_with_year

这将正确处理这样的日期，以及单个日期和具有多个日期的字符串：

# Create some strings
txt_list = [
     "The dates are from 30 June 2019 to 1 January 2022 inclusive",
     "It was 4 August 2021",
     "We were gone from 21 April 2019 until 21 May 2020"
]

# Run the new function
[extract_dates_with_year(txt) for txt in txt_list]

# Output:
[
    ['30 June 2019', '1 January 2022'],
    ['4 August 2021'],
    ['21 April 2019', '21 May 2020']
]

从 spaCy 中的句子中提取日期

问题描述投票：0回答：1

1个回答

最新问题

从 spaCy 中的句子中提取日期

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1