使用Regex捕获HTML注释但忽略某个注释

Question

我想捕获html注释，但特定注释除外，即

 <!-- end-readmore-item -->

目前，我可以使用下面的正则表达式成功捕获所有HTML注释，

(?=<!--)([\s\S]*?)-->

为了忽略指定的注释，我尝试了前瞻和后瞻性断言但是在Regex的高级水平上是新的我可能错过了一些东西。

到目前为止，我已经能够使用lookarounds设计以下正则表达式，

^((?!<!-- end-readmore-item -->).)*$

我希望它忽略end-readmore-item评论，只捕获其他评论，如，

<!-- Testing-->

但是，它完成了这项工作，但也捕获了我想要忽略的常规HTML标记。

我一直在使用以下html代码作为测试用例，

<div class="collapsible-item-body" data-defaulttext="Further text">Further 
text</div>
<!-- end-readmore-item --></div>
</div>
&nbsp;<!-- -->
it only should match with <!-- --> but it's selecting everything except <!-- 
end-readmore-item -->
the usage of this is gonna be to remove all the HTML comments except <!-- 
end-readmore-item -->

Answer 1

您可以使用以下模式：

<!--(?!\s*?end-readmore-item\s*-->)[\s\S]*?-->

Regex101 demo。

分解：

<!--                    # Matches `<!--` literally.
(?!                     # Start of a negative Lookahead (not followed by).
    \s*                 # Matches zero or more whitespace characters.
    end-readmore-item   # Matches literal string.
    \s*                 # Matches zero or more whitespace characters.
    -->                 # Matches `-->` literally.
)                       # End of the negative Lookahead.
[\s\S]*?                # Matches any character zero or more time (lazy match), 
                        # including whitespace and non-whitespace characters.
-->                     # Matches `-->` literally.

这基本上意味着：

匹配]，然后是任意数量的字符，然后紧跟-->。

*可选的空格重复零次或多次。

Answer 2

您的负向前瞻断言非常接近，您只需按如下方式修改它：

<!--((?!end-readmore-item).)*?-->

*?非贪婪地匹配。

这将匹配除评论正文中包含字符串end-readmore-item的所有注释。

使用Regex捕获HTML注释但忽略某个注释

问题描述投票：3回答：2

2个回答

最新问题

使用Regex捕获HTML注释但忽略某个注释

问题描述 投票：3回答：2

2个回答

最新问题

问题描述投票：3回答：2