作为输入,我有几个包含日期格式不同的字符串,例如
dateutil.parser.parse
识别字符串中的日期。在下一步中,我想从字符串中删除日期。结果应该是[作为输入,我有几个包含不同格式的日期的字符串,例如“ Peter在16:45喝茶”“我的生日是1990年8月7日”“在7月11日,星期六,我将回到家”,我使用dateutil。 parser.parse ...
>>> dt, tokens = parse("April drinks tea at 16:45", fuzzy_with_tokens=True)
>>> print(dt)
2018-04-17 16:45:00
>>> print('<missing>'.join(tokens))
drinks tea at
from dateutil import parser
data = ['Peter drinks tea at 16:45', 'My birthday is on 08-07-1990', "On Sat 11 July I'll be back home"]
def is_valid_date(date_str):
try:
parser.parse(date_str)
return True
except:
return False
new_list = [' '.join([w for w in line.split() if not is_valid_date(w)]) for line in data]
print(new_list)
# ['Peter drinks tea at', 'My birthday is on', "On I'll be back home"]
测试:
def remove_dates(sentence): """remove the dates like Mar 30 2013""" sentence = re.sub('(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s\d{2}\s\d{4}', ' ', sentence) return sentence