如何使用php从远程HTML页面检索特定的元素和属性?
例如,如果要检索的元素和属性的格式为:
<a href="/dir/someid/" class="ccc">
任何帮助将不胜感激。
将使用的编码方法:
<?php
$file = fopen ("http://www.example.com/", "r");
if (!$file) {
echo "<p>Unable to open remote file.\n";
exit;
}
while (!feof ($file)) {
$line = fgets ($file, 1024);
/* This only works if the title and its tags are on one line */
if (preg_match ("@\<title\>(.*)\</title\>@i", $line, $out)) {
$title = $out[1];
break;
}
}
fclose($file);
?>
$homepage = file_get_contents ("https://www.somedomain.com");
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
@$doc->loadHTML($homepage);
$xpath = new DOMXpath($doc);
$results = $xpath->query("//div[@class='some-class']");
foreach($results as $contextNode) {
$text = $xpath->evaluate("string(./a[1])",$contextNode);
$href = $xpath->evaluate("string(./a[1]/@href)",$contextNode);
}