。net正则表达式-在最后一个列表项中不包含句号的字符串

问题描述 投票:0回答:1

我正在尝试使用.net正则表达式来标识XML数据中的字符串,该字符串在最后一个标记之前不包含句号。我对正则表达式没有太多经验。我不确定我需要更改什么以及为什么要获得想要的结果。

数据中每行的末尾都有换行符和回车符。

良好的XML数据示例:

<randlist prefix="unorder">
    <item>abc</item>
    <item>abc</item>
    <item>abc</item>
</randlist>

错误的XML数据示例-regexp应该匹配-最后一个</item>之前的句号:

<randlist prefix="unorder">
    <item>abc</item>
    <item>abc</item>
    <item>abc.</item>
</randlist>

我尝试过的Reg exp模式不适用于不良XML数据(未经良好XML数据测试):

^<randlist \w*=[\S\s]*\.*[^.]<\/item>[\n]*<\/randlist>$

使用http://regexstorm.net/tester的结果:

0 matches

使用https://regex101.com/的结果:

0 matches

由于完全停止和开始字符串条件,因此此问题与以下imo不同:

Regex for string not ending with given suffix

3的解释:

/
^<randlist \w*=[\S\s]*\.*[^.]<\/item>[\n]*<\/randlist>$
/
gm
^ asserts position at start of a line
<randlist  matches the characters <randlist  literally (case sensitive)
\w* matches any word character (equal to [a-zA-Z0-9_])
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
= matches the character = literally (case sensitive)
Match a single character present in the list below [\S\s]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\.* matches the character . literally (case sensitive)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character not present in the list below [^.]
. matches the character . literally (case sensitive)
< matches the character < literally (case sensitive)
\/ matches the character / literally (case sensitive)
item> matches the characters item> literally (case sensitive)
Match a single character present in the list below [\n]*
< matches the character < literally (case sensitive)
\/ matches the character / literally (case sensitive)
randlist> matches the characters randlist> literally (case sensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
.net regex regex-negation
1个回答
0
投票

@@ Silvanas是绝对正确的。您不应该使用Regex解决此问题,而应使用某种形式的XML解析器读取数据并使用.查找行。但是,如果出于某种可怕的原因而必须使用Regex,并且如果数据的结构与示例完全相同,则Regex解决方案如下:

^\s+<item>[^<]*?(?<=\.)<\/item>$

如果与该正则表达式有任何匹配,则您的xml格式不正确。但是同样,如果空格不正确,行上还有其他内容,标签arent <item>..</item>等依此类推,则此正则表达式也会失败。再说一次,除非您可以绝对保证.以外的所有格式都将是格式正确的XML],否则最好不使用Regex解决此问题。

© www.soinside.com 2019 - 2024. All rights reserved.