转义字符后带有字边界的PHP正则表达式

问题描述 投票:1回答:1

我最近偶然发现了这个问题,我无法弄清楚它为什么会发生。

请考虑以下示例:我有一个随机文本和一些包含一些编程语言的数组。在循环中,我将使用正则表达式和前后字符边界\ b匹配每种语言作为整个单词,然后打印URL。

$string = 'I don\'t know C e C++ so well, but I can code in PHP.';
$languages = [
    'PHP' => '/php/',
    'C++' => '/cpp/',
    'C' => '/c/',
];

foreach ($languages as $name => $uri) {
    $regex = '/\b' . preg_quote($name, '/') . '\b/';
    if (preg_match($regex, $string)) {
        echo "For {$name} information refer to http://foo.bar{$uri}" . PHP_EOL;
    }
}

我希望以下输出:

For PHP information refer to http://foo.bar/php/
For C++ information refer to http://foo.bar/cpp/
For C information refer to http://foo.bar/c/

但是,我得到的输出是:

For PHP information refer to http://foo.bar/php/
For C information refer to http://foo.bar/c/

在转义加号(+)之后的单词边界(\ b)不能像我预期的那样工作。

如果我用[^ \ w]取代\ b它可以工作,但我不是100%肯定这种做法不会适得其反。

为什么会发生这种情况,以及如何获得我需要的结果呢?

php regex preg-match
1个回答
1
投票

解决此问题的推荐方法是使用lookarounds来断言单词字符而不是边界,例如(?<!\w)c\+\+(?!\w)

$string = 'I don\'t know C e C++ so well, but I can code in PHP.';
$languages = [
    'PHP' => '/php/',
    'C++' => '/cpp/',
    'C' => '/c/',
];

foreach ($languages as $name => $uri) {
    $regex = '/(?<!\w)' . preg_quote($name, '/') . '(?!\w)/';
    if (preg_match($regex, $string)) {
        echo "For {$name} information refer to http://foo.bar{$uri}" . PHP_EOL;
    }
}

输出:

For PHP information refer to http://foo.bar/php/
For C++ information refer to http://foo.bar/cpp/
For C information refer to http://foo.bar/c/
© www.soinside.com 2019 - 2024. All rights reserved.