如何将 ISO 8601 持续时间字符串转换为
datetime.timedelta
?
我尝试使用持续时间字符串和格式字符串实例化
timedelta
,但出现异常:
>>> from datetime import timedelta
>>> timedelta("PT1H5M26S", "T%H%M%S")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported type for timedelta seconds component: str
pandas.Timedelta
。构造函数接受 ISO 8601 字符串,并且 pandas.Timedelta.isoformat
您可以将实例格式化回字符串:
>>> import pandas as pd
>>> dt = pd.Timedelta("PT1H5M26S")
>>> dt
Timedelta('0 days 01:05:26')
>>> dt.isoformat()
'P0DT1H5M26S'
这是一个没有新包的解决方案,但仅在您处理以天为单位的最大持续时间时才有效。不过,这种限制是有道理的,因为正如其他人指出的那样(1):
鉴于 timedelta 的天数超过“一个月”,如何 你会使用 ISO8601 持续时间符号来描述它吗? 引用特定时间点?相反,根据你的例子, “P3Y6M4DT12H30M5S”,你如何将其转换为时间增量 不知道这个持续时间指的是哪一个确切的年份和月份? Timedelta 对象是非常精确的野兽,这几乎可以肯定 为什么他们不支持“年”和“月”参数 构造函数。
import datetime
def get_isosplit(s, split):
if split in s:
n, s = s.split(split)
else:
n = 0
return n, s
def parse_isoduration(s):
# Remove prefix
s = s.split('P')[-1]
# Step through letter dividers
days, s = get_isosplit(s, 'D')
_, s = get_isosplit(s, 'T')
hours, s = get_isosplit(s, 'H')
minutes, s = get_isosplit(s, 'M')
seconds, s = get_isosplit(s, 'S')
# Convert all to seconds
dt = datetime.timedelta(days=int(days), hours=int(hours), minutes=int(minutes), seconds=int(seconds))
return int(dt.total_seconds())
> parse_isoduration("PT1H5M26S")
3926
很好的问题,显然“正确”的解决方案取决于您对输入的期望(更可靠的数据源不需要那么多的输入验证)。
我解析 ISO8601 持续时间时间戳的方法仅检查“PT”前缀是否存在,并且不会假设任何单位的整数值:
from datetime import timedelta
def parse_isoduration(isostring, as_dict=False):
"""
Parse the ISO8601 duration string as hours, minutes, seconds
"""
separators = {
"PT": None,
"W": "weeks",
"D": "days",
"H": "hours",
"M": "minutes",
"S": "seconds",
}
duration_vals = {}
for sep, unit in separators.items():
partitioned = isostring.partition(sep)
if partitioned[1] == sep:
# Matched this unit
isostring = partitioned[2]
if sep == "PT":
continue # Successful prefix match
dur_str = partitioned[0]
dur_val = float(dur_str) if "." in dur_str else int(dur_str)
duration_vals.update({unit: dur_val})
else:
if sep == "PT":
raise ValueError("Missing PT prefix")
else:
# No match for this unit: it's absent
duration_vals.update({unit: 0})
if as_dict:
return duration_vals
else:
return tuple(duration_vals.values())
dur_isostr = "PT3H2M59.989333S"
dur_tuple = parse_isoduration(dur_isostr)
dur_dict = parse_isoduration(dur_isostr, as_dict=True)
td = timedelta(**dur_dict)
s = td.total_seconds()
⇣
>>> dur_tuple
(0, 0, 3, 2, 59.989333)
>>> dur_dict
{'weeks': 0, 'days': 0, 'hours': 3, 'minutes': 2, 'seconds': 59.989333}
>>> td
datetime.timedelta(seconds=10979, microseconds=989333)
>>> s
10979.989333
基于@r3robertson,一个更完整但不完美的版本
def parse_isoduration(s):
""" Parse a str ISO-8601 Duration: https://en.wikipedia.org/wiki/ISO_8601#Durations
Originally copied from:
https://stackoverflow.com/questions/36976138/is-there-an-easy-way-to-convert-iso-8601-duration-to-timedelta
:param s:
:return:
"""
# ToDo [40]: Can't handle legal ISO3106 ""PT1M""
def get_isosplit(s, split):
if split in s:
n, s = s.split(split, 1)
else:
n = '0'
return n.replace(',', '.'), s # to handle like "P0,5Y"
s = s.split('P', 1)[-1] # Remove prefix
s_yr, s = get_isosplit(s, 'Y') # Step through letter dividers
s_mo, s = get_isosplit(s, 'M')
s_dy, s = get_isosplit(s, 'D')
_, s = get_isosplit(s, 'T')
s_hr, s = get_isosplit(s, 'H')
s_mi, s = get_isosplit(s, 'M')
s_sc, s = get_isosplit(s, 'S')
n_yr = float(s_yr) * 365 # These are approximations that I can live with
n_mo = float(s_mo) * 30.4 # But they are not correct!
dt = datetime.timedelta(days=n_yr+n_mo+float(s_dy), hours=float(s_hr), minutes=float(s_mi), seconds=float(s_sc))
return dt # int(dt.total_seconds()) # original code wanted to return as seconds, we don't.
这是我的修改(Martin,rer 答案)以支持
weeks
属性并返回毫秒。某些持续时间可能使用 PT15.460S
分数。
def parse_isoduration(str):
## https://stackoverflow.com/questions/36976138/is-there-an-easy-way-to-convert-iso-8601-duration-to-timedelta
## Parse the ISO8601 duration as years,months,weeks,days, hours,minutes,seconds
## Returns: milliseconds
## Examples: "PT1H30M15.460S", "P5DT4M", "P2WT3H"
def get_isosplit(str, split):
if split in str:
n, str = str.split(split, 1)
else:
n = '0'
return n.replace(',', '.'), str # to handle like "P0,5Y"
str = str.split('P', 1)[-1] # Remove prefix
s_yr, str = get_isosplit(str, 'Y') # Step through letter dividers
s_mo, str = get_isosplit(str, 'M')
s_wk, str = get_isosplit(str, 'W')
s_dy, str = get_isosplit(str, 'D')
_, str = get_isosplit(str, 'T')
s_hr, str = get_isosplit(str, 'H')
s_mi, str = get_isosplit(str, 'M')
s_sc, str = get_isosplit(str, 'S')
n_yr = float(s_yr) * 365 # approx days for year, month, week
n_mo = float(s_mo) * 30.4
n_wk = float(s_wk) * 7
dt = datetime.timedelta(days=n_yr+n_mo+n_wk+float(s_dy), hours=float(s_hr), minutes=float(s_mi), seconds=float(s_sc))
return int(dt.total_seconds()*1000) ## int(dt.total_seconds()) | dt
您可以简单地使用正则表达式来解析 ISO 8601 持续时间,而无需引入外部依赖项。
以下适用于常见的 D/H/M/S 指示符。未实现对 Y/M/W 指示符的支持。
import datetime
import re
def parse_iso8601_duration(duration: str) -> datetime.timedelta:
pattern = r"^P(?:(?P<days>\d+\.\d+|\d*?)D)?T?(?:(?P<hours>\d+\.\d+|\d*?)H)?(?:(?P<minutes>\d+\.\d+|\d*?)M)?(?:(?P<seconds>\d+\.\d+|\d*?)S)?$"
match = re.match(pattern, duration)
if not match:
raise ValueError(f"Invalid ISO 8601 duration: {duration}")
parts = {k: float(v) for k, v in match.groupdict("0").items()}
return datetime.timedelta(**parts)