我知道其他人已提出有关此错误的问题,但我无法看到这个正则表达式或主题字符串是如何更简单。
对我来说这是一个错误,但在提交给PHP之前,我想我会确保并获得帮助,看看这是否更简单。
这是一个显示2个字符串的小测试脚本;一个1024 x,一个1023:
// 1024 x's
$str = '_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
// Outputs nothing (bug?)
echo preg_replace('/(?<=[^\w]|^)_([^_\n\t ](.|\n(?!\n))*?)_(?=[^\w]|$)/', '[i]${1}[/i]', $str);
echo "\n\n";
// 1023 x's
$str = '_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
// Outputs the unchanged string as expected
echo preg_replace('/(?<=[^\w]|^)_([^_\n\t ](.|\n(?!\n))*?)_(?=[^\w]|$)/', '[i]${1}[/i]', $str);
如您所见,只有稍长的字符串(大于1024个字符)才会出现错误。将由此处理的字符串将是任意长度 - 它们将是论坛帖子,新闻文章等。
正则表达式解释
只是尝试做一些降价解析,将像_I am italic_
这样的字符串转换为我们在某些情况下从旧网站使用的旧版标记。原因/用途并不重要。重要的是我相信这应该可以正常工作,事实上它确实如此,除了PHP 7以外的其他地方。
只有当它代表一个独立的单词或句子时,它才应匹配这些下划线。如果它前面有任何“基于字”的字符,则它不应与第一个下划线匹配,如果后跟任何“基于字”的字符,则它不应与最后一个下划线匹配。
环境:Centos 7,PHP:7.1.6
重要的提示:
应避免使用(.|\n)*?
或(.|\r?\n)*?
模式,因为它们会导致过多的冗余回溯。要匹配任何字符,您通常可以使用带有DOTALL标志的.
,或者,在JavaScript中,您可以使用[^]
或[\s\S]
构造。有关详细信息,请参阅How do I match any character across multiple lines in a regular expression?。
目前的问题
(.|\n(?!\n))*?
模式的效率非常低,并且在模式结束时使用时会导致大量冗余回溯(根本没有任何意义)。它越位于模式的左侧,性能越差。
因为它所做的只是匹配任何字符而不是换行符,然后是一个没有跟随另一个换行符的换行符,以懒惰的方式,你可以重新编写模式为.*?(?:\R(?!\R).*?)*
:
'~\b_([^_\n\t ].*?(?:\R(?!\R).*?)*)_\b~'
注意:
(?<=[^\w]|^)
= \b
,因为在看后面有一个_
(一个字char)(?=[^\w]|$)
= \b
,因为在前瞻之前有一个_
.*?(?:\R(?!\R).*?)*
- 匹配:
.*?
- 除了换行符之外的任何0 +字符,尽可能少
(?:\R(?!\R).*?)*
- 零个或多个序列:
\R(?!\R)
- 一个换行符序列没有跟随另一个换行符序列(\R
= \n
,\r\n
或\r
)
.*?
- 除了换行符之外的任何0 +字符,尽可能少