正则表达式仅匹配一半

Question

我正在尝试匹配以下数据，以便可以提取时间码之间的文本。

subs='''

1
00:00:00,130 --> 00:00:01,640

where you there when it 
happened?

Who else saw you?

2
00:00:01,640 --> 00:00:03,414


This might be your last chance to


come clean. Take it or leave it.
'''

Regex=re.compile(r'(\d\d:\d\d\:\d\d,\d\d\d) --> (\d\d:\d\d\:\d\d,\d\d\d)(\n.+)((\n)?).+')

我的正则表达式与时间码的第一行和文本的第一行匹配，但仅从第二行而不是整个第二行返回几个字符。如何获取匹配超时代码和实时代码之间的所有内容？

Answer 1

我不确定，但是我认为以下解决方案更适合您的情况...※使用下面的解决方案，您不仅可以提取时间码之间的文本，还可以将文本连接到时间码。

import re

multiline_text=\
"""

1 00:00:00,130 --> 00:00:01,640

where you there when it happened?

Who else saw you?

2 00:00:01,640 --> 00:00:03,414

This might be your last chance to

come clean. Take it or leave it.
"""

lines = multiline_text.split('\n')
dict = {}
current_key = None;

for line in lines:
  is_key_match_obj = re.search('([\d\:\,]{12})(\s-->\s)([\d\:\,]{12})', line)
  if is_key_match_obj:
    current_key = is_key_match_obj.group()
    continue

  if current_key:
    if current_key in dict:
      if not line:
        dict[current_key] += '\n'
      else:
        dict[current_key] += line
    else:
      dict[current_key] = line

print(dict)

Answer 2

当前方法的一个可能的问题是，尝试捕获时间戳之间的所有内容时，您没有使用DOT ALL模式。我有re.search在DOT ALL模式下工作：

subs="""

1 00:00:00,130 --> 00:00:01,640

where you there when it happened?

Who else saw you?

2 00:00:01,640 --> 00:00:03,414

This might be your last chance to

come clean. Take it or leave it. """
match = re.search(r'\d+ \d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d+\s*(.*)\d+ \d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d+', subs, flags=re.DOTALL)
if match:
    print match.group(1)
else:
    print 'NO MATCH'

此打印：

where you there when it happened?

Who else saw you?

Answer 3

您也可以不使用DOTALL而获得比赛。

匹配时间码并在组1中捕获，使用负前瞻将不以时间码开头的以下所有行与之匹配。

^\d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d+((?:\r?\n(?!\d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d).*)*)

部分

[^字符串的开头
[\d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d+匹配时间码模式
(捕获组1
- (?:非捕获组
  - [\r?\n匹配新人
  - (?!\d{2}:\d{2}:\d{2},\d+ --> \d{2}:\d{2}:\d{2},\d)负前瞻，不声明时间码
  - [.*匹配除换行符0+次以外的任何字符
- [)*关闭非捕获组并重复0+次]
[)关闭捕获组1

Regex demo

正则表达式仅匹配一半

问题描述投票：0回答：3

3个回答

最新问题

正则表达式仅匹配一半

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3