如何使用https://github.com/voku/simple_html_dom访问元描述

问题描述 投票:0回答:1

我正在尝试使用https://github.com/voku/simple_html_dom访问meta h1和meta描述文本。我可以访问h1,但不能访问元描述文本。

php代码如下:

use voku\helper\HtmlDomParser;

require_once 'vendor/autoload.php';

$urls = array("https://example.org/a.php", "https://example.org/b.php", "https://example.org/c.php");

$urls_count = count($urls);

for ($i = 0; $i < count($urls); $i++) {
    $dom = HtmlDomParser::file_get_html($urls[$i]);
    $h1 = $dom->findOne('h1')->innertext; //this returns, assuing there is only one h1
    $description = $dom->findOne("meta[name='description']")->innertext; // this returns nothing 
    echo '<div>
<p><strong><a href='.$urls[$i].'>'.$h1.'</a></strong>    </p><p>'.$description.'</p></div>';
}
php simple-html-dom
1个回答
0
投票

在这里您可以找到一个例子:https://github.com/voku/simple_html_dom/blob/master/example/example_extract_meta_tags.php

$templateHtml = '
<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <meta name="description" content="Free Web tutorials">
  <meta name="keywords" content="HTML,CSS,XML,JavaScript">
  <meta name="author" content="Lars Moelleken">
</head>
<body>
<p>All meta information goes in the head section...</p>
</body>
</html>
';
$htmlTmp = HtmlDomParser::str_get_html($templateHtml);
foreach ($htmlTmp->find('meta') as $meta) {
    if ($meta->hasAttribute('content')) {
        $meta_data[$meta->getAttribute('name')][] = $meta->getAttribute('content');
    }
}
// dump contents
/** @noinspection ForgottenDebugOutputInspection */
var_export($meta_data, false);
/*
[
    'description' => [
        'Free Web tutorials',
    ],
    'keywords' => [
        'HTML,CSS,XML,JavaScript',
    ],
    'author' => [
        'Lars Moelleken',
    ],
]
 */

PS:如果要搜索示例,那么现在是查看测试的好时机,如果没有测试,那么您还是不想使用该库。 : - )

© www.soinside.com 2019 - 2024. All rights reserved.