从字符串中解析格式奇怪的时间表达式

Question

我正在尝试解析格式化的字符串。我需要知道我检索的每个项目已经处理了多少小时，分钟和秒。

我收到的数据是这种格式，例如：

PT5H12M3S，这意味着5小时12分3秒。

但是，如果工作时间不到一小时，则不会显示：

PT12M3S，这意味着12分3秒。

更重要的是，如果没有在项目上工作（或者只有不到一分钟），数据将显示为：

PT0S

如果项目只有完整的工作时间，它将显示为：

PT5H

我尝试使用以下代码解析数据：

estimated = track_data['project']['estimate']['estimate'].split('PT')[1]
estimated_hours = estimated.split('H')[0]
estimated_minutes = estimated_hours.split('M')[0]
estimated_seconds = estimated_minutes.split('S')[0]

但此解决方案仅在数据格式为PT5H12M3S时有效。所有其他格式，这都是错误的。例如，如果我获取数据PT5H，那么估计的小时数将是5，但估计的分钟和秒也将是5。显然这不是我们想要的。

有没有人可以给我指导哪里看？我尝试了其他一些分裂的东西，但它似乎不起作用，因为如果它找不到'M'或'S'，它将只重复相同的数字。

希望这是有道理的，并提前感谢。

Answer 1

您可以使用正则表达式：

import re

PROJECT_TIME_REGEX = re.compile(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?')

def get_project_time(s):
    m = PROJECT_TIME_REGEX.match(s)
    if not m:
        raise ValueError('invalid string')
    hour, min, sec = (int(g) if g is not None else 0 for g in m.groups())
    return hour, min, sec

print(get_project_time('PT5H12M3S'))
# (5, 12, 3)
print(get_project_time('PT12M3S'))
# (0, 12, 3)
print(get_project_time('PT0S'))
# (0, 0, 0)
print(get_project_time('PT5H'))
# (5, 0, 0)

Answer 2

这个怎么样？

import re

def parsept(ptstring):
    regex = re.compile(
            r'PT'
            r'(?:(?P<h>\d+)H)?'
            r'(?:(?P<m>\d+)M)?'
            r'(?:(?P<s>\d+)S)?')
    m = regex.match(ptstring)
    if m:
        return (int(m.group('h')) if m.group('h') else 0, 
            int(m.group('m') if m.group('m') else 0,
            int(m.group('s') if m.group('s') else 0)
    # else
    raise ValueError('{0} does not look like a valid PTxHyMzS string'.format(ptstring))

Answer 3

您可以在正则表达式中使用正则表达式和组来捕获小时，分钟和秒 - 所有这些都可以是可选的。

有点像：/PT(\d*)H?(\d*)M?(\d*)S?/

括号捕获组。因此，您的捕获组将包含小时，分钟和秒（所有这些都是可选的）。

但是正则表达式并不那么可读。我强烈建议尝试像Parsec这样的解析器组合库。解析器组合器更具可读性和可维护性，并且令人愉快。

Answer 4

没有正则表达式的解决方案，基于条件

def parse_time(str_to_parse):
    str_to_parse = str_to_parse.split('PT')[1]
    time_units = ['H', 'M', 'S'] #this needs to always be in left to right or bigger to smaller order
    estimated_time = {k: 0 for k in time_units} 
    for k in time_units:
        if k in str_to_parse:
            left, right = str_to_parse.split(k)
            estimated_time[k], str_to_parse = int(left), right
    return estimated_time

estimated = "PT12M3S"
final_time = parse_time(estimated)
print(final_time)
{'H': 0, 'M': 12, 'S': 3}

Answer 5

我希望这段代码有意义。这是一种非常简单的方法，您可以循环遍历字符串的字符，将数字添加到当前并在到达字母字符时评估它们（'S'，'M'，'H'）。

estimated = 'PT5H'
clean = estimated.split('PT')[1]
seconds = 0
minutes = 0
hours = 0
current = ''

for char in clean:
    if char.isdigit():
        current += char
    else:
        if char == 'S':
            seconds = int(current)
        elif char == 'M':
            minutes = int(current)
        else:
            hours = int(current)

        current = ''

print(hours, minutes, seconds)

从字符串中解析格式奇怪的时间表达式

问题描述投票：1回答：5

5个回答

最新问题

从字符串中解析格式奇怪的时间表达式

问题描述 投票：1回答：5

5个回答

最新问题

问题描述投票：1回答：5