过滤必须包含特定文本且不包含其他文本的网址数组

问题描述 投票:0回答:2

我想从网站中提取特定链接。

链接看起来像这样:

/topic/Funny/G1pdeJm

链接始终相同 - 除了最后一个随机字符。

我很难将这些部分组合起来

(preg_match("/^http:\/\//i",$str) || is_file($str))

(preg_match("/Funny(.*)/", $str) || is_file($str))

第一个代码提取每个链接 第二次从链接中仅提取 /topic/Funny/* 部分。

不幸的是,我无法将它们组合起来,我也想阻止这些标签:

/topic/Funny/viral
/topic/Funny/time
/topic/Funny/top
/topic/Funny/top/week
/topic/Funny/top/month
/topic/Funny/top/year
/topic/Funny/top/all
php regex validation url filtering
2个回答
2
投票

您可以尝试使用否定前瞻来“过滤”您不喜欢的网址:

.*\/Funny\/(?!viral|time|top\/week|top\/month|top\/year|top\/all|top(\n|$)).*

演示在这里


0
投票

我将准备一组测试字符串并展示使用正则表达式过滤 URL 的实现。

正则表达式细分:

^
http://                              #match literal characters
[^/]+                                #match one or more non-slash characters (domain portion)
/topic/Funny/                        #match literal characters
(?!                                  #not followed by:
   viral                             #viral
   |time                             #OR time
   |top(?:/week|/month|/year|/all)?  #OR top, top/week, top/month, top/year, top/all
)

实现:(演示

$tests = [
    'http://example.com/topic/Funny/G1pdeJm',
    'http://example.com/topic/Funny/viral',
    'http://example.com/topic/Funny/time',
    'http://example.com/topic/Funny/top',
    'http://example.com/topic/Funny/top/week',
    'http://example.com/topic/Funny/top/month',
    'http://example.com/topic/Funny/top/year',
    'http://example.com/topic/Funny/top/all',
    'http://example.com/topic/NotFunny/IL2dsRq',
];

$result = [];
foreach ($tests as $str) {
    if (preg_match('~^http://[^/]+/topic/Funny/(?!viral|time|top(?:/week|/month|/year|/all)?)~', $str)) {
        $result[] = $str;
    }
}
var_export($result);

输出:

array (
  0 => 'http://example.com/topic/Funny/G1pdeJm',
)
© www.soinside.com 2019 - 2024. All rights reserved.