获取网址并忽略其他网址

问题描述 投票:0回答:1

我想得到除了包含 "getindex.php "和 "PICSNUM "的hrefurl之外的所有hrefurl。

<a href="/video5505298733/travel_and_tourism_recovery_coronavirus." title="The places and companies missing tourist dollars most.">The places and companies missing tourist dollars most.</a></p><p class="info"><span class="bg"><span class="duration">10 min</span><a href="/get/index.php?id=qafMsaaScGLPuKqGuanBpZjHtGHKppeHpJu5r6G9raaHoqa3tJS-ope5tJK6s5TLqp8"><span class="name">CORONAVIRUS</span></a><span><span class="bolder"> - </span> 1.7k <span class="bolder">Views</span></span><span class="text-disabled"><span class="bolder"> - </span> 2 days ago</span><span class="bolder"> - </span></span></p></div></div>               <div class="thumb-lock "><div class="thumb-big"><div class="thumb"><a href="/midia54891337/PICSNUM/russia_fire_coronavirus_patients_intl"><img src="lightbox.gif" data-src="https://cdn-pic.cnews-cdn.com/videos/thumbs169/22/d3/a2/22d3a23423dfda7f5/22d3a2dfbb9fdfgd43f5.PICNUM.jpg"  /></a>

我看了一下这个题目,负向看盘是如何运作的,但我觉得我不明白它是如何运作的。包含一件事但不包括另一件事的Regex。

我试过了,但没有用

(?<=href=")^(?!\/(get|PICSNUM))[a-z0-9-_\/.]+

https:/regex101.comrbG8Rq42

我改了一下,结果更好了,但仍有一部分包含PICSNUM的URLs仍在返回。

(?<=href=")(?!\/(get|PICSNUM))[a-z0-9-_\/.]+

https:/regex101.comr12HHHt1

/video5505298733/travel_and_tourism_recovery_coronavirus.
/midia54891337/

我哪里做错了?"Regex "对我来说有点混乱

php regex preg-match
1个回答
1
投票

Php demo wit DOMDocument/getYou could use the alternation in the existing pattern, but that would not be very efficient./PICSNUM

Instead you could use a capturing group:/PICSNUMRegex demo 如果你已经找到了这些值,你可以使用负向的lookahead来断言字符串不以任何一个

^(?!(?:/get|\S*/PICSNUM))\S+

或含有 这个模式之所以还不能用,是因为 不直接跟在第一个lookbehind之后。

注册表演示

© www.soinside.com 2019 - 2024. All rights reserved.