我想替换这个字符串:
This is my Store, it has an amazing design; its creator says it was losing money and he doesn't want to maintain it
除了
'
(不)和所有选定的连词之外的所有非字母数字字符:
is, it, its, the, this, if, so, and
到目前为止我已经得到了这个结果:
Array
(
[1] => This
[2] => my
[3] => Store
[4] => has
[5] => an
[6] => amazing
[7] => design
[8] => s
[9] => creator
[10] => says
[11] => was
[12] => losing
[13] => money
[14] => and
[15] => he
[16] => doesn
[17] => t
[18] => want
[19] => maintain
)
这是代码:
$string = "This is my Store, it has an amazing design; its creator says it was losing money and he doesn't want to maintain it";
$words = array_filter(preg_split('/\s+/', preg_replace('/\W|\b(it|the|its|is|to)|\b/i', ' ', $string)));
print_r($words);
但是正如你所看到的,当它应该替换
it
时,它正在替换 its
,并且它也在 '
中替换 doesn't
。
有人可以帮助我理解我做错了什么吗? X_X
P.S:我还需要它不区分大小写,
/i
工作得非常滑稽:(
谢谢!
将正则表达式更改为:
/\W\B|\b(it|the|its|is|to)\b/i
|\b
中的管道对我来说没有意义,也许这是一个错字。 \B
之后的附加 \W
将确保非字母字符仅在其后面没有紧跟着字母字符时才被替换。这比您所要求的限制要少,但对于其他情况也可能有用,例如带有连字符的单词(例如婆婆)。
首先,在区分大小写的
preg_replace()
调用中删除您在黑名单中提到的所有整个单词(从技术上讲,这些不是英语中的连词)。
然后使用
str_word_count()
提取整个单词(甚至缩写和连字符的单词)。
代码:(演示)
print_r(
str_word_count(
preg_replace('/\b(?:its|i[stf]|the|this|so|and)|\b/i', '', $string),
1 // mode 1 returns words as a flat, indexed array
)
);
输出:
Array
(
[0] => my
[1] => Store
[2] => has
[3] => an
[4] => amazing
[5] => design
[6] => creator
[7] => says
[8] => was
[9] => losing
[10] => money
[11] => he
[12] => doesn't
[13] => want
[14] => to
[15] => maintain
)