如何使用php从远程HTML页面检索特定的元素和属性?

问题描述 投票:0回答:1

如何使用php从远程HTML页面检索特定的元素和属性?

例如,如果要检索的元素和属性的格式为:

<a href="/dir/someid/" class="ccc">

任何帮助将不胜感激。

将使用的编码方法:


<?php
   $file = fopen ("http://www.example.com/", "r");
   if (!$file) {
       echo "<p>Unable to open remote file.\n";
       exit;
   }
   while (!feof ($file)) {
       $line = fgets ($file, 1024);
       /* This only works if the title and its tags are on one line */
       if (preg_match ("@\<title\>(.*)\</title\>@i", $line, $out)) {
           $title = $out[1];
           break;
       }
   }
   fclose($file);
   ?>

php html web-crawler extract
1个回答
0
投票
$homepage = file_get_contents ("https://www.somedomain.com"); $doc = new DOMDocument; $doc->preserveWhiteSpace = false; @$doc->loadHTML($homepage); $xpath = new DOMXpath($doc); $results = $xpath->query("//div[@class='some-class']"); foreach($results as $contextNode) { $text = $xpath->evaluate("string(./a[1])",$contextNode); $href = $xpath->evaluate("string(./a[1]/@href)",$contextNode); }
© www.soinside.com 2019 - 2024. All rights reserved.