目标:
问题我有:我不能让PHP甚至认识到xhtml:link是<url>项的childNode;即使我只是为<url>吐出nodeValue,它也会省略所有<xhtml:link>子节点。
我正在使用/尝试的代码:
<?php
$xml = <<< XML
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<loc>https://www.example.com/ca/en/cat/categories/series/07660/</loc>
<lastmod>2018-11-07</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-ae" href="https://www.example.com/ae/en/cat/categories/series/07660/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="de-at" href="https://www.example.com/at/de/cat/07660/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-au" href="https://www.example.com/au/en/cat/categories/series/07660/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-ca" href="https://www.example.com/ca/en/cat/categories/series/07660/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="fr-ca" href="https://www.example.com/ca/fr/cat/categories/series/07660/" />
</url>
<url xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<loc>https://www.example.com/ca/en/cat/categories/series/07683/</loc>
<lastmod>2018-11-07</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-ae" href="https://www.example.com/ae/en/cat/categories/series/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="de-at" href="https://www.example.com/at/de/cat/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-au" href="https://www.example.com/au/en/cat/categories/series/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="fr-be" href="https://www.example.com/be/fr/collections/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="nl-be" href="https://www.example.com/be/nl/collecties/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-bh" href="https://www.example.com/bh/en/cat/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="en-ca" href="https://www.example.com/ca/en/cat/categories/series/07683/" />
<xhtml:link xmlns:xhtml="http://www.w3.org/1999/xhtml" rel="alternate" hreflang="fr-ca" href="https://www.example.com/ca/fr/cat/categories/series/07683/" />
</url>
</urlset>
XML;
$urlsxml = new DOMDocument;
$urlsxml->loadXML($xml);
$urls = $urlsxml->getElementsByTagName('url');
for ($i = 0; $i < $urls->length; $i++) {
echo $urls->item($i)->nodeValue;
echo $urls->getElementsByTagName("xhtml:link")->attributes->getNamedItem("hreflang")->nodeValue;
// INSERT INTO DB
}
?>
出于想法;任何帮助,将不胜感激。
XML使用两个名称空间http://www.sitemaps.org/schemas/sitemap/0.9
而不使用别名,http://www.w3.org/1999/xhtml
使用别名xhtml
。要使用命名空间读取XML,您应该使用DOM方法的*NS
变体。
$urls = $urlsxml->getElementsByTagNameNS(
'http://www.sitemaps.org/schemas/sitemap/0.9', 'url'
);
$urls[$i]->getElementsByTagNameNS('http://www.w3.org/1999/xhtml', 'link');
第一个参数是名称空间URI,第二个参数是本地名称(带有前缀的节点名称)。在这种情况下,对名称空间URI使用常量/变量是个好主意。
一个更舒适的选择是Xpath。它允许您使用位置路径和条件来获取节点。
$document = new DOMDocument;
$document->loadXML($xml);
// create an xpath instance for the document
$xpath = new DOMXpath($document);
// register the namespaces for your own prefixes
$xpath->registerNameSpace('s', 'http://www.sitemaps.org/schemas/sitemap/0.9');
$xpath->registerNameSpace('x', 'http://www.w3.org/1999/xhtml');
// iterate all sitemap url elements
foreach ($xpath->evaluate('//s:url') as $url) {
$data = [
// get the sitemap loc child element as a string
'loc' => $xpath->evaluate('string(s:loc)', $url),
// get the href attribute of the xhtml link element (with language condition)
'fr-ca' => $xpath->evaluate('string(x:link[@hreflang="fr-ca"]/@href)', $url),
];
var_dump($data);
}
输出:
array(2) {
["loc"]=>
string(58) "https://www.example.com/ca/en/cat/categories/series/07660/"
["fr-ca"]=>
string(58) "https://www.example.com/ca/fr/cat/categories/series/07660/"
}
array(2) {
["loc"]=>
string(58) "https://www.example.com/ca/en/cat/categories/series/07683/"
["fr-ca"]=>
string(58) "https://www.example.com/ca/fr/cat/categories/series/07683/"
}
Xpath中的string()
将列表中的第一个节点转换为字符串。它允许您避免显式访问节点对象属性。例如,$xpath->evaluate('s:loc', $url)->item(0)->textContent;
可以写成$xpath->evaluate('string(s:loc)', $url);
。与属性访问不同,如果不存在匹配节点,则Xpath强制转换不会因错误而失败。它将返回一个空字符串。
在数据库中插入的实际行为超出了此处代码的范围,但是为了解析XML,您可以执行一些简单的操作(基于本地保存的XML副本而不是使用heredoc
语法)〜的名称该文件仅用于识别。
最初我认为这需要在XPath表达式中注册和使用namespace
但事实并非如此 - 使用父节点url
作为查询的参考节点,每个url
节点的简单XPath查询就足够了。
$file='so-stack-xml-namespace.xml';
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=true;
$dom->recover=true;
$dom->strictErrorChecking=true;
$dom->load( $file );
libxml_clear_errors();
$xp=new DOMXPath( $dom );
$urls=$dom->getElementsByTagName('url');
foreach( $urls as $url ){
$href=$url->nodeValue;
$frca=$xp->query('xhtml:link[@hreflang="fr-ca"]',$url)->item(0)->getAttribute('href');
/* do something with the variables...add to DB */
printf('href:%s<br />frca:%s<br /><br />', $href,$frca);
}
如果将XML文件放入变量中,则可以使用循环提取值:
$xml = file_get_contents("your_xml_file");
$tags = explode("<", $xml);
$loc = "not found";
$frhref = "not found";
foreach ($tags as $tag){
if(strpos($tag, "loc>") === 0){
$loc = substr($tag, 4);
}
if(strpos($tag, "xhtml:link") === 0){
$at = strpos($tag, "hreflang") + 9;
$lang = substr($tag, $at, 7);
if($lang == '"fr-ca"'){
$at = strpos($tag, "href=") + 6;
$_href = substr($tag, $at);
$until = strpos($_href, '"');
$frhref = substr($_href, 0, $until);
}
}
}
echo $loc, " ", $frhref; //put them in your db
我用你的内容测试了它:https://3v4l.org/1laON