排除时间戳的正则表达式模式

问题描述 投票:0回答:1

我有以下文字:

Master of the universe\n\n(Jul 26, 2023 - 1:00pm)\n\n(Interviewee: Marina)\n\n\n\n(00:00:05 - 00:00:09)\n\n\t Alice: This project. Uh my job is to ask lots of questions.\n\n\n\n(00:00:10 - 00:00:11)\n\n\t Marina: What is it?\n\n\n\n(00:00:11 - 00:00:14)\n\n\t Alice: Uh uh impartially.\n\n\n\n(00:00:15 - 00:00:18)\n\n\t Alice: Uh so suddenly I don't work for a particular brand.\n\n\n\n(00:00:19 - 00:00:21)\n\n\t Alice: Uh I'm self-employed,\n\n\n\n(00:00:21 - 00:00:21)\n\n\t Marina: M M.\n\n\n\n(00:00:21 - 00:00:32)\n\n\t Alice: I do group interviews with lots of brands, from toothpaste to the product we're going to talk about today.\n\n\n\n(00:00:32 - 00:00:32)\n\n\t Marina: Okay.\n\n\n\n(00:00:33 - 00:00:37)\n\n\t Alice: Uh today we're gonna talk for an hour uh.\n\n\n\n(00:00:36 - 00:00:36)\n\n\t Marina: Okay.\n\n\n\n(00:00:37 - 00:00:39)\n\n\t 

从上面的文本中,我想提取

name: text
。例如:

Alice: This project. Uh my job is to ask lots of questions.
Marina: What is it?
Alice: Uh uh impartially.
Alice: Uh so suddenly I don't work for a particular brand.
Alice: Uh I'm self-employed,
Marina: M M.
Alice: I do group interviews with lots of brands, from toothpaste to the product we're going to talk about today.
Marina: Okay.
Alice: Uh today we're gonna talk for an hour uh.
Marina: Okay.

我能够从这个正则表达式代码中识别时间戳,但不能排除它们:

(?:[\\n]+\(\d{2}:\d{2}:\d{2} - \d{2}:\d{2}:\d{2}\)[\\n\\t\\s]+|$)

我需要一个正则表达式模式,可以排除所有时间戳和其他文本,只保留如上所示的

name: text

P.S:我不希望 python 代码使用上述模式进行正则表达式替换。我只是一个完整的模式来查找

name: text

的匹配
python regex ms-word docx regex-group
1个回答
0
投票

我会使用

re

做类似的事情
pattern = r'(\w+): (.+)'

matches = re.findall(pattern, input_text)

for match in matches:
    name, text = match
    print(f"{name}: {text}")

这将打印您正在寻找的图案。希望这有帮助。

输出:

© www.soinside.com 2019 - 2024. All rights reserved.