PHP preg_replace 非字母数字字符和选择的连词,然后拆分

问题描述 投票:0回答:2

我想替换这个字符串:

This is my Store, it has an amazing design; its creator says it was losing money and he doesn't want to maintain it

除了

'
(不)和所有选定的连词之外的所有非字母数字字符:

is, it, its, the, this, if, so, and

到目前为止我已经得到了这个结果:

Array
(
    [1] => This
    [2] => my
    [3] => Store
    [4] => has
    [5] => an
    [6] => amazing
    [7] => design
    [8] => s
    [9] => creator
    [10] => says
    [11] => was
    [12] => losing
    [13] => money
    [14] => and
    [15] => he
    [16] => doesn
    [17] => t
    [18] => want
    [19] => maintain
)

这是代码:

$string = "This is my Store, it has an amazing design; its creator says it was losing money and he doesn't want to maintain it";
$words = array_filter(preg_split('/\s+/', preg_replace('/\W|\b(it|the|its|is|to)|\b/i', ' ', $string)));

print_r($words);

https://3v4l.org/cLrM4

但是正如你所看到的,当它应该替换

it
时,它正在替换
its
,并且它也在
'
中替换
doesn't

有人可以帮助我理解我做错了什么吗? X_X

P.S:我还需要它不区分大小写

/i
工作得非常滑稽:(

谢谢!

php preg-replace
2个回答
1
投票

将正则表达式更改为:

/\W\B|\b(it|the|its|is|to)\b/i

|\b
中的管道对我来说没有意义,也许这是一个错字。
\B
之后的附加
\W
将确保非字母字符仅在其后面没有紧跟着字母字符时才被替换。这比您所要求的限制要少,但对于其他情况也可能有用,例如带有连字符的单词(例如婆婆)。


0
投票

首先,在区分大小写的

preg_replace()
调用中删除您在黑名单中提到的所有整个单词(从技术上讲,这些不是英语中的连词)。

然后使用

str_word_count()
提取整个单词(甚至缩写和连字符的单词)。

代码:(演示

print_r(
    str_word_count(
        preg_replace('/\b(?:its|i[stf]|the|this|so|and)|\b/i', '', $string),
        1  // mode 1 returns words as a flat, indexed array
    )
);

输出:

Array
(
    [0] => my
    [1] => Store
    [2] => has
    [3] => an
    [4] => amazing
    [5] => design
    [6] => creator
    [7] => says
    [8] => was
    [9] => losing
    [10] => money
    [11] => he
    [12] => doesn't
    [13] => want
    [14] => to
    [15] => maintain
)
© www.soinside.com 2019 - 2024. All rights reserved.