我正在尝试使用https://github.com/voku/simple_html_dom访问meta h1和meta描述文本。我可以访问h1,但不能访问元描述文本。
php代码如下:
use voku\helper\HtmlDomParser;
require_once 'vendor/autoload.php';
$urls = array("https://example.org/a.php", "https://example.org/b.php", "https://example.org/c.php");
$urls_count = count($urls);
for ($i = 0; $i < count($urls); $i++) {
$dom = HtmlDomParser::file_get_html($urls[$i]);
$h1 = $dom->findOne('h1')->innertext; //this returns, assuing there is only one h1
$description = $dom->findOne("meta[name='description']")->innertext; // this returns nothing
echo '<div>
<p><strong><a href='.$urls[$i].'>'.$h1.'</a></strong> </p><p>'.$description.'</p></div>';
}
在这里您可以找到一个例子:https://github.com/voku/simple_html_dom/blob/master/example/example_extract_meta_tags.php
$templateHtml = '
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="description" content="Free Web tutorials">
<meta name="keywords" content="HTML,CSS,XML,JavaScript">
<meta name="author" content="Lars Moelleken">
</head>
<body>
<p>All meta information goes in the head section...</p>
</body>
</html>
';
$htmlTmp = HtmlDomParser::str_get_html($templateHtml);
foreach ($htmlTmp->find('meta') as $meta) {
if ($meta->hasAttribute('content')) {
$meta_data[$meta->getAttribute('name')][] = $meta->getAttribute('content');
}
}
// dump contents
/** @noinspection ForgottenDebugOutputInspection */
var_export($meta_data, false);
/*
[
'description' => [
'Free Web tutorials',
],
'keywords' => [
'HTML,CSS,XML,JavaScript',
],
'author' => [
'Lars Moelleken',
],
]
*/
PS:如果要搜索示例,那么现在是查看测试的好时机,如果没有测试,那么您还是不想使用该库。 : - )