我有这段文字,我试图删除所有内部引号,而只保留一个引号级别。引号内的文本包含任何字符,甚至换行符等。是否可以使用正则表达式或者我必须编写一个小解析器?
[quote=foo]I really like the movie. [quote=bar]World
War Z[/quote] It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]
这里是我想要的文字:
[quote=foo]I really like the movie. It's amazing![/quote]
This is my comment.
[quote]Hello, World[/quote]
This is another comment.
[quote]Bye Bye Baby[/quote]
这是我在PHP中使用的正则表达式:
%\[quote\s*(=[a-zA-Z0-9\-_]*)?\](.*)\[/quote\]%si
我也尝试了此变体,但它与.
或,
不匹配,而且我无法弄清楚在引号中还能找到什么:
%\[quote\s*(=[a-zA-Z0-9\-_]*)?\]([\w\s]+)\[/quote\]%i
问题位于这里:
(.*)
您可以使用此:
$result = preg_replace('~\G(?!\A)(?>(\[quote\b[^]]*](?>[^[]+|\[(?!/?quote)|(?1))*\[/quote])|(?<!\[)(?>[^[]+|\[(?!/?quote))+\K)|\[quote\b[^]]*]\K~', '', $text);
详细信息:
\G(?!\A) # contiguous to a precedent match
(?> ## content inside "quote" tags at level 0
( ## nested "quote" tags (group 1)
\[quote\b[^]]*]
(?> ## content inside "quote" tags at any level
[^[]+
| # OR
\[(?!/?quote)
| # OR
(?1) # repeat the capture group 1 (recursive)
)*
\[/quote]
)
|
(?<!\[) # not preceded by an opening square bracket
(?> ## content that is not a quote tag
[^[]+ # all that is not a [
| # OR
\[(?!/?quote) # a [ not followed by "quote" or "/quote"
)+\K # repeat 1 or more and reset the match
)
| # OR
\[quote\b[^]]*]\K # "quote" tag at level 0
我认为编写解析器会更容易。
使用正则表达式查找[quote]
和[\quote]
,然后分析结果。
preg_match_all('#(\[quote[^]]*\]|\[\/quote\])#', $bbcode, $matches, PREG_OFFSET_CAPTURE);
$nestlevel = 0;
$cutfrom = 0;
$cut = false;
$removed = 0
foreach($matches(0) as $quote){
if (substr($quote[0], 0, 1) == '[') $nestlevel++; else $nestlevel--;
if (!$cut && $nestlevel == 2){ // we reached the first nested quote, start remove here
$cut = true;
$cutfrom = $quote[1];
}
if ($cut && $nestlevel == 1){ // we closed the nested quote, stop remove here
$cut = false;
$bbcode = substr_replace($bbcode, '', $cutfrom - $removed, $quote[1] + 8 - $removed); // strlen('[\quote]') = 8
$removed += $quote[1] + 8 - $cutfrom;
}
);
以上都不对我有用。制定了两个对我有用的解决方案:
~\[quote.*\].*\[\/quote\]*~si
或
~\[\/?quote.*\].*\[\/?quote\]~si