正则表达式提取页面上另一个标签内的第一个链接[重复]

Question

我一直在尝试设置一个简单的 PHP API，该 API 基本上可以通过两个步骤从另一个站点检索信息。如果一个人要这样做，它将涉及：

搜索网站
单击第一个结果
查找资料

该网站是以可预测的方式设置的。我知道搜索网站的格式是什么，因此我可以使用 PHP 创建搜索 URL 并输入 API。

步骤 1/2 的链接格式如下：

<h4><a href="somelinkhere" class="search_result_title" title="sometitle" data-followable="true">Some Text Here</a></h4>

我只想要

somelinkhere

，超链接本身。我知道这是页面上包含在

<h4>

中的第一个超链接。

我尝试了一些正则表达式与

preg_match

的组合，但它们都失败了。例如，以下是一种失败的方法：

$url = "https://www.example.com/?query=somequery";
$input = @file_get_contents($url) or die("Could not access file: $url");
preg_match_all('/<h4><a [^>]*\bhref\s*=\s*"\K[^"]*[^"]*/', $text, $results);
echo "$results";
echo "$results[0]";
echo "$results[0][0]";

我做了最后三个回声，因为我不太熟悉

preg_match_all

返回的格式。我也尝试了

preg_match

，结果相同。我只关心第一个这样的链接，所以我不需要

preg_match_all

，但如果我能得到第一个结果，那也可以。

解析页面并将

h4

中的第一个超链接放入变量的最佳方法是什么？

Answer 1

也许，如果您只想提取第一个

h4

，那么您可能需要将其修改为，

(?i)<h4><a [^>]*\bhref\s*=\s*"\s*([^"]*)\s*".*

带有

标志。

$re = '/(?i)<h4><a [^>]*\bhref\s*=\s*"\s*([^"]*)\s*".*/s';
$str = '<h4><a href="somelinkhere" class="search_result_title" title="sometitle" data-followable="true">Some Text Here</a></h4><h4><a href="somelinkhere" class="search_result_title" title="sometitle" data-followable="true">Some Text Here</a></h4>
';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

foreach ($matches as $match) {
    print($match[1]);
}

输出

somelinkhere

如果您想简化/修改/探索表达式，请在regex101.com的右上角面板上进行解释。如果您愿意，您还可以在此链接中观看它如何与某些示例输入进行匹配。

正则表达式提取页面上另一个标签内的第一个链接[重复]

问题描述投票：0回答：1

1个回答

输出

最新问题

正则表达式提取页面上另一个标签内的第一个链接[重复]

问题描述 投票：0回答：1

1个回答

输出

最新问题

问题描述投票：0回答：1