Python 2.7：使用正则表达式匹配表达式

Question

我有以下字符串：

asc_epsWarn_mu8                  # I want asc and epsWarn 
asc_ger_phiK_mi16                # I want asc and ger_Phik
ARSrt_FAC_RED5_DSR_AU16            # I want ARSrt and FAC_RED5_DSR

基本上我想要在第一组中第一个_之前的字符以及第二组中第一个和最后一个下划线_之间的所有字符。

我是正则表达式的新手。是否可以为所有上述字符串编写单个正则表达式。我能想到的最好的是

(\w+)_(\w+)_(\w+)

但它不起作用。什么是正确的正则表达式？

Answer 1

您可以将此正则表达式与2个捕获组一起使用：

^([^_]+)_(.+)_[^_]*$

RegEx Demo

RegEx详细信息：

^：开始吧
([^_]+)：捕获组＃1以匹配1个非下划线字符
_：匹配-
(.+)：捕获组＃2以匹配任何角色的1+直到下一场比赛
_：匹配-
[^_]*：匹配0个或更多非下划线字符
$：结束

Answer 2

wordcharacter \w也匹配下划线。

如果你想匹配没有下划线的单词字符，你可以使用一个否定的字符类，并匹配一个非空格字符与下划线[^\W_]

对于第二组，您可以使用具有重复模式的2个捕获组：

^([^\W_]+)_((?:[^\W_]+_)*)[^\W_]+$

^字符串的开头
([^\W_]+)_匹配单词char的1倍以上，除了组1中的下划线，匹配下划线
(捕获组2 (?:[^\W_]+_)*重复0次匹配单词char除了下划线，然后是下划线
)关闭组2
[^\W_]+匹配除了下划线之外的单词char的1倍以上
$字符串结尾

Regex demo

Answer 3

尝试使用此模式：

([^_]+)_(.*)_.*

示例脚本：

input = "ARSrt_FAC_RED5_DSR_AU16"
matches = re.match(r'([^_]+)_(.*)_.*', input)
if matchObj:
    print "part1: ", matches.group(1)
    print "part2: ", matches.group(2)

part1:  ARSrt
part2:  FAC_RED5_DSR

以下是正则表达式模式的简要说明：

([^_]+) match and capture the term before the first underscore
_       match a literal underscore
(.*)    then greedily match and consume everything up until the last undescore
_       match the last underscore
.*      consume the remainder of the string

Python 2.7：使用正则表达式匹配表达式

问题描述投票：3回答：3

3个回答

最新问题

Python 2.7：使用正则表达式匹配表达式

问题描述 投票：3回答：3

3个回答

最新问题

问题描述投票：3回答：3