我正在尝试使用.net正则表达式来标识XML数据中的字符串,该字符串在最后一个标记之前不包含句号。我对正则表达式没有太多经验。我不确定我需要更改什么以及为什么要获得想要的结果。
数据中每行的末尾都有换行符和回车符。
良好的XML数据示例:
<randlist prefix="unorder">
<item>abc</item>
<item>abc</item>
<item>abc</item>
</randlist>
错误的XML数据示例-regexp应该匹配-最后一个</item>
之前的句号:
<randlist prefix="unorder">
<item>abc</item>
<item>abc</item>
<item>abc.</item>
</randlist>
我尝试过的Reg exp模式不适用于不良XML数据(未经良好XML数据测试):
^<randlist \w*=[\S\s]*\.*[^.]<\/item>[\n]*<\/randlist>$
使用http://regexstorm.net/tester的结果:
0 matches
使用https://regex101.com/的结果:
0 matches
由于完全停止和开始字符串条件,因此此问题与以下imo不同:
Regex for string not ending with given suffix
3的解释:
/
^<randlist \w*=[\S\s]*\.*[^.]<\/item>[\n]*<\/randlist>$
/
gm
^ asserts position at start of a line
<randlist matches the characters <randlist literally (case sensitive)
\w* matches any word character (equal to [a-zA-Z0-9_])
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
= matches the character = literally (case sensitive)
Match a single character present in the list below [\S\s]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\.* matches the character . literally (case sensitive)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character not present in the list below [^.]
. matches the character . literally (case sensitive)
< matches the character < literally (case sensitive)
\/ matches the character / literally (case sensitive)
item> matches the characters item> literally (case sensitive)
Match a single character present in the list below [\n]*
< matches the character < literally (case sensitive)
\/ matches the character / literally (case sensitive)
randlist> matches the characters randlist> literally (case sensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
@@ Silvanas是绝对正确的。您不应该使用Regex解决此问题,而应使用某种形式的XML解析器读取数据并使用.
查找行。但是,如果出于某种可怕的原因而必须使用Regex,并且如果数据的结构与示例完全相同,则Regex解决方案如下:
^\s+<item>[^<]*?(?<=\.)<\/item>$
如果与该正则表达式有任何匹配,则您的xml格式不正确。但是同样,如果空格不正确,行上还有其他内容,标签arent <item>..</item>
等依此类推,则此正则表达式也会失败。再说一次,除非您可以绝对保证除.
以外的所有格式都将是格式正确的XML],否则最好不使用Regex解决此问题。