要匹配的线是
part1a_part1b__part1c_part1d_part3.extension
part1a_part1b__part1c_part1d__part3.extension
part1a_part1b__part1c_part1d_part2short_part3.extension
part1a_part1b__part1c_part1d_part2short__part3.extension
part1a_part1b__part1c_part1d_part2_part3.extension
part1a_part1b__part1c_part1d_part2__part3.extension
part1a_part1b__part1c_part1d_part2full_part3.extension
part1a_part1b__part1c_part1d_part2full__part3.extension
part1a_part1b__part1c_part1d_part2short-part3.extension
part1a_part1b__part1c_part1d_part2-part3.extension
part1a_part1b__part1c_part1d_part2full-part3.extension
part1a_part1b__part1c_part1d_part4.extension
part1a_part1b__part1c_part1d__part4.extension
除了最后两行之外,所需的匹配应该为所有上述行提供精确的part1a_part1b__part1c_part1d
。也就是说,“干”具有任意数量的part1
,optional part2 (in limited forms)
,并且必须以part3.extension
结尾。
现在,我只是到了
(?P<stem>[[:alnum:]_-]+)(?=(|part2short|part2|part2full))[_-]+part3\.extension
,上面的行匹配的“词干”值是
part1a_part1b__part1c_part1d
part1a_part1b__part1c_part1d_
part1a_part1b__part1c_part1d_part2short
part1a_part1b__part1c_part1d_part2short_
part1a_part1b__part1c_part1d_part2
part1a_part1b__part1c_part1d_part2_
part1a_part1b__part1c_part1d_part2full
part1a_part1b__part1c_part1d_part2full_
part1a_part1b__part1c_part1d_part2short
part1a_part1b__part1c_part1d_part2
part1a_part1b__part1c_part1d_part2full
你有没有可以评论如何匹配除了最后两行之外的所有上述行中的part1a_part1b__part1c_part1d
,如果可能的话?
你可以使用这个正则表达式使用非贪婪的匹配,一个带有可选匹配的前瞻:
(?m)^(?P<stem>[[:alnum:]_-]+?)(?=(?:[_-]+part2(?:short|full)?)?[_-]+part3\.extension$)
(?=(?:[_-]+part2(?:short|full)?)?[_-]+part3\.extension$)
是一个积极的先行者,用[-_]part3.extension
和可选的[-_]part2...
字符串断言行尾。
您可以将前4个部分与文本和下划线匹配,并使用一个肯定的前瞻,断言字符串以part3.extension结尾:
^(?P<stem>[^_]+_[^_]+__[^_]+_[^_]+)(?=.*part3\.extension$)
这将匹配:
^ # Begin of the string (?P<stem> # Named captured group stem [^_]+_ # Match not _ one or more times, then _ [^_]+__ # Match not _ one or more times, then __ [^_]+_ # Match not _ one or more times, then _ [^_]+ # # Match not _ one or more times ) # Close named capturing group (?= # A positive lookahead that asserts what follows .*part3\.extension$ # Match part3.extension at the end of the string ) # Close lookahead