如何基于它们各自的innerHTML的一部分为 和 标签创建id属性? [关闭]

问题描述 投票:-1回答:1
[我有一个使用TinyMCE TOC的客户,但不喜欢该插件添加到标题标签(<h2><h3>)的随机ID。

我想创建一个脚本来分析文章,针对每个h2和h3标签,然后从它们包含的文本中创建id属性。

我以为我可以用preg_replace_callback()来做到这一点,但是当我使用该函数时,我意识到在某些情况下它不起作用。

例如,如果h2 / h3的文本以空格,数字等开头,则不起作用。>>

这是在某些情况下可行的早期尝试:

function function_to_makeItClear($string) { $string = strtolower($string); $string = str_ireplace(' ', '-', $string); return preg_replace('/[^A-Za-z0-9\-]/', '', $string); } function betterId($match){ $escape = str_split(strip_tags($match[2]), 20); $id = strlen($escape[0]) >=5 ? function_to_makeItClear($escape[0]) : str_shuffle('AnyWordsHere'); return '<h'.$match[1].' id="'.$id.'">'.$match[2].'</h'.$match[1].'>'; } return preg_replace_callback('#<h([1-6]).*?>(.*?)<\/h[1-6]>#si', 'betterId', $texte);

这里是一些我想转换的示例文本:

<p>Paragraph one is okay </p> <h2>This will work without problem</h2> <p>Paragraph two is okay </p> <h2><a href="#">This heading has anchor</a></h2> <p>Paragraph one is okay </p> <h2> This heading start with space</h2> <p>Paragraph two is okay </p> <h3>1. <a href="https://www.example1.com/">This wont work</a></h3> <p>Paragraph one is okay </p> <h3>2. <a href="https://www.example2.com/">Not working</a></h3> <p>Paragraph two is okay </p> <h3>3. Neither this one</h3> <h3>But this works again</h3>

我想得到这个结果:

<p>Paragraph one is okay </p> <h2 id="this-will-work">This will work without problem</h2> <p>Paragraph two is okay </p> <h2 id="this-heading-has"><a href="#">This heading has anchor</a></h2> <p>Paragraph one is okay </p> <h2 id="this-heading-start"> This heading start with space</h2> <p>Paragraph two is okay </p> <h3 id="this-wont-work">1. <a href="https://www.example1.com/">This wont work</a></h3> <p>Paragraph one is okay </p> <h3 id="not-working">2. <a href="https://www.example2.com/">Not working</a></h3> <p>Paragraph two is okay </p> <h3 id="neighter-this-one">3. Neither this one</h3> <h3 id="but-this-works">But this works again</h3>

更新:

此后,我使用DOM解析器实现了另一种方法,结果很好,但是在某些情况下它失败了,我必须自己手动添加id。]

我有一个使用TinyMCE TOC的客户,但不喜欢该插件添加到标题标签的随机ID(

)。我想创建一个解析...

] >>

使用DOMDocument和它的好朋友XPath从有效的html中可靠地提取标题标记。

使用nodeValue()从标题标签的innerHTML生成无标签的字符串。 (Demonstration of what nodeValue() generates

使用preg_match()排除前导空格和数字,然后匹配第一个,两个或三个单词。 (A slightly altered demonstration of the pattern

如果匹配项包含至少一个单词,请用连字符代替空格,并将该字符串添加为id属性。

代码:(Demo

$html = <<<HTML <div> <p>Paragraph one is okay </p> <h2>This will work without problem</h2> <p>Paragraph two is okay </p> <h2><a href="#">This heading has anchor</a></h2> <p>Paragraph one is okay </p> <h2> This heading start with space</h2> <p>Paragraph two is okay </p> <h3>1. <a href="https://www.example1.com/">This wont work</a></h3> <p>Paragraph one is okay </p> <h3>2. <a href="https://www.example2.com/">Not working</a></h3> <p>Paragraph two is okay </p> <h3>3. Neither this one</h3> <h3>But this works again</h3> </div> HTML; $dom = new DOMDocument; $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); $xpath = new DOMXPath($dom); foreach ($xpath->query("//h2 | //h3") as $node) { if (preg_match('~^\s*(?:\d+\.)?\s*\K\S+(?:\s+\S+){1,2}~', $node->nodeValue, $m)) { $node->setAttribute('id', str_replace(' ', '-', strtolower($m[0]))); } } echo $dom->saveHTML();

输出:

<div> <p>Paragraph one is okay </p> <h2 id="this-will-work">This will work without problem</h2> <p>Paragraph two is okay </p> <h2 id="this-heading-has"><a href="#">This heading has anchor</a></h2> <p>Paragraph one is okay </p> <h2 id="this-heading-start"> This heading start with space</h2> <p>Paragraph two is okay </p> <h3 id="this-wont-work">1. <a href="https://www.example1.com/">This wont work</a></h3> <p>Paragraph one is okay </p> <h3 id="not-working">2. <a href="https://www.example2.com/">Not working</a></h3> <p>Paragraph two is okay </p> <h3 id="neither-this-one">3. Neither this one</h3> <h3 id="but-this-works">But this works again</h3> </div>

php regex attributes domparser html-heading
1个回答
0
投票
使用DOMDocument和它的好朋友XPath从有效的html中可靠地提取标题标记。

使用nodeValue()从标题标签的innerHTML生成无标签的字符串。 (Demonstration of what nodeValue() generates

© www.soinside.com 2019 - 2024. All rights reserved.