使用nodeValue()
从标题标签的innerHTML生成无标签的字符串。 (Demonstration of what nodeValue() generates)
<h2>
和<h3>
)的随机ID。我想创建一个脚本来分析文章,针对每个h2和h3标签,然后从它们包含的文本中创建id
属性。
我以为我可以用preg_replace_callback()
来做到这一点,但是当我使用该函数时,我意识到在某些情况下它不起作用。
例如,如果h2 / h3的文本以空格,数字等开头,则不起作用。>>
这是在某些情况下可行的早期尝试:
function function_to_makeItClear($string) {
$string = strtolower($string);
$string = str_ireplace(' ', '-', $string);
return preg_replace('/[^A-Za-z0-9\-]/', '', $string);
}
function betterId($match){
$escape = str_split(strip_tags($match[2]), 20);
$id = strlen($escape[0]) >=5 ? function_to_makeItClear($escape[0]) : str_shuffle('AnyWordsHere');
return '<h'.$match[1].' id="'.$id.'">'.$match[2].'</h'.$match[1].'>';
}
return preg_replace_callback('#<h([1-6]).*?>(.*?)<\/h[1-6]>#si', 'betterId', $texte);
这里是一些我想转换的示例文本:
<p>Paragraph one is okay </p>
<h2>This will work without problem</h2>
<p>Paragraph two is okay </p>
<h2><a href="#">This heading has anchor</a></h2>
<p>Paragraph one is okay </p>
<h2> This heading start with space</h2>
<p>Paragraph two is okay </p>
<h3>1. <a href="https://www.example1.com/">This wont work</a></h3>
<p>Paragraph one is okay </p>
<h3>2. <a href="https://www.example2.com/">Not working</a></h3>
<p>Paragraph two is okay </p>
<h3>3. Neither this one</h3>
<h3>But this works again</h3>
我想得到这个结果:
<p>Paragraph one is okay </p>
<h2 id="this-will-work">This will work without problem</h2>
<p>Paragraph two is okay </p>
<h2 id="this-heading-has"><a href="#">This heading has anchor</a></h2>
<p>Paragraph one is okay </p>
<h2 id="this-heading-start"> This heading start with space</h2>
<p>Paragraph two is okay </p>
<h3 id="this-wont-work">1. <a href="https://www.example1.com/">This wont work</a></h3>
<p>Paragraph one is okay </p>
<h3 id="not-working">2. <a href="https://www.example2.com/">Not working</a></h3>
<p>Paragraph two is okay </p>
<h3 id="neighter-this-one">3. Neither this one</h3>
<h3 id="but-this-works">But this works again</h3>
更新:
此后,我使用DOM解析器实现了另一种方法,结果很好,但是在某些情况下它失败了,我必须自己手动添加id
。]
和
使用nodeValue()
从标题标签的innerHTML生成无标签的字符串。 (Demonstration of what nodeValue() generates)
使用preg_match()
排除前导空格和数字,然后匹配第一个,两个或三个单词。 (A slightly altered demonstration of the pattern)
如果匹配项包含至少一个单词,请用连字符代替空格,并将该字符串添加为id属性。
代码:(Demo)
$html = <<<HTML
<div>
<p>Paragraph one is okay </p>
<h2>This will work without problem</h2>
<p>Paragraph two is okay </p>
<h2><a href="#">This heading has anchor</a></h2>
<p>Paragraph one is okay </p>
<h2> This heading start with space</h2>
<p>Paragraph two is okay </p>
<h3>1. <a href="https://www.example1.com/">This wont work</a></h3>
<p>Paragraph one is okay </p>
<h3>2. <a href="https://www.example2.com/">Not working</a></h3>
<p>Paragraph two is okay </p>
<h3>3. Neither this one</h3>
<h3>But this works again</h3>
</div>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//h2 | //h3") as $node) {
if (preg_match('~^\s*(?:\d+\.)?\s*\K\S+(?:\s+\S+){1,2}~', $node->nodeValue, $m)) {
$node->setAttribute('id', str_replace(' ', '-', strtolower($m[0])));
}
}
echo $dom->saveHTML();
输出:
<div> <p>Paragraph one is okay </p> <h2 id="this-will-work">This will work without problem</h2> <p>Paragraph two is okay </p> <h2 id="this-heading-has"><a href="#">This heading has anchor</a></h2> <p>Paragraph one is okay </p> <h2 id="this-heading-start"> This heading start with space</h2> <p>Paragraph two is okay </p> <h3 id="this-wont-work">1. <a href="https://www.example1.com/">This wont work</a></h3> <p>Paragraph one is okay </p> <h3 id="not-working">2. <a href="https://www.example2.com/">Not working</a></h3> <p>Paragraph two is okay </p> <h3 id="neither-this-one">3. Neither this one</h3> <h3 id="but-this-works">But this works again</h3> </div>
使用nodeValue()
从标题标签的innerHTML生成无标签的字符串。 (Demonstration of what nodeValue() generates)