从特定模式的字符串中提取单词

问题描述 投票:0回答:1

在给定的字符串中,除了“::”之间的单词之外,仅提取字母数字,而不管“:”和字母数字之间的空格如何,它应该能够提取它。下面是代码示例

import re

message = "ass :gifs_e4VLc8f2_galabingo: ass dof:stickers_t3B0l2J7_galabingo:dor"
message1 = ":gifs_e4VLc8f2_galabingo::stickers_t3B0l2J7_galabingo:"
# Regex pattern to extract words that do not start and end with colons
pattern = r'(?<!:)(?::[^:]+:)*([^:]+)(?::[^:]+:)*(?!:)'

# Find all occurrences of words in the message that do not start and end with colons
words_without_colons = re.findall(pattern, message)
words_without_colons1 = re.findall(pattern, message1)
print(words_without_colons)
print(words_without_colons1 )

实际产量:

Op 1: ['ass', 'ass dof', 'or'] 操作 2: ['ifs_e4VLc8f2_galabing', 'tickers_t3B0l2J7_galabing']

预期输出: op1 : ['ass', 'ass dof', 'dor']

op2 : [] #空列表

python regex pattern-matching
1个回答
1
投票

也许使用 re.split 可以通过在冒号之间使用由不间断字符组成的分隔符(带有可选的前导/尾随空格)来使这变得更容易:

import re

pattern  = r" ?:[^ :]*?: ?"

message  = "ass :gifs_e4VLc8f2_galabingo: ass dof:stickers_t3B0l2J7_galabingo:dor"
message1 = ":gifs_e4VLc8f2_galabingo::stickers_t3B0l2J7_galabingo:"

*words,  = filter(None,re.split(pattern,message))
*words1, = filter(None,re.split(pattern,message1))

print(words)  # ['ass', 'ass dof', 'dor']
print(words1) # []
© www.soinside.com 2019 - 2024. All rights reserved.