如何基于它们各自的innerHTML的一部分为和标签创建id属性？ [关闭]

问题描述投票：-1回答：1

[我有一个使用TinyMCE TOC的客户，但不喜欢该插件添加到标题标签（<h2>和<h3>）的随机ID。

我想创建一个脚本来分析文章，针对每个h2和h3标签，然后从它们包含的文本中创建id属性。

我以为我可以用preg_replace_callback()来做到这一点，但是当我使用该函数时，我意识到在某些情况下它不起作用。

例如，如果h2 / h3的文本以空格，数字等开头，则不起作用。>>

这是在某些情况下可行的早期尝试：

function function_to_makeItClear($string) { $string = strtolower($string); $string = str_ireplace(' ', '-', $string); return preg_replace('/[^A-Za-z0-9\-]/', '', $string); } function betterId($match){ $escape = str_split(strip_tags($match[2]), 20); $id = strlen($escape[0]) >=5 ? function_to_makeItClear($escape[0]) : str_shuffle('AnyWordsHere'); return '<h'.$match[1].' id="'.$id.'">'.$match[2].'</h'.$match[1].'>'; } return preg_replace_callback('#<h([1-6]).*?>(.*?)<\/h[1-6]>#si', 'betterId', $texte);

这里是一些我想转换的示例文本：

<p>Paragraph one is okay </p> <h2>This will work without problem</h2> <p>Paragraph two is okay </p> <h2><a href="#">This heading has anchor</a></h2> <p>Paragraph one is okay </p> <h2> This heading start with space</h2> <p>Paragraph two is okay </p> <h3>1. <a href="https://www.example1.com/">This wont work</a></h3> <p>Paragraph one is okay </p> <h3>2. <a href="https://www.example2.com/">Not working</a></h3> <p>Paragraph two is okay </p> <h3>3. Neither this one</h3> <h3>But this works again</h3>

我想得到这个结果：

<p>Paragraph one is okay </p> <h2 id="this-will-work">This will work without problem</h2> <p>Paragraph two is okay </p> <h2 id="this-heading-has"><a href="#">This heading has anchor</a></h2> <p>Paragraph one is okay </p> <h2 id="this-heading-start"> This heading start with space</h2> <p>Paragraph two is okay </p> <h3 id="this-wont-work">1. <a href="https://www.example1.com/">This wont work</a></h3> <p>Paragraph one is okay </p> <h3 id="not-working">2. <a href="https://www.example2.com/">Not working</a></h3> <p>Paragraph two is okay </p> <h3 id="neighter-this-one">3. Neither this one</h3> <h3 id="but-this-works">But this works again</h3>

更新：

此后，我使用DOM解析器实现了另一种方法，结果很好，但是在某些情况下它失败了，我必须自己手动添加id。]

我有一个使用TinyMCE TOC的客户，但不喜欢该插件添加到标题标签的随机ID（

和

）。我想创建一个解析...

] >>

使用DOMDocument和它的好朋友XPath从有效的html中可靠地提取标题标记。

使用nodeValue()从标题标签的innerHTML生成无标签的字符串。（Demonstration of what nodeValue() generates）

使用preg_match()排除前导空格和数字，然后匹配第一个，两个或三个单词。（A slightly altered demonstration of the pattern）

如果匹配项包含至少一个单词，请用连字符代替空格，并将该字符串添加为id属性。

代码：（Demo）

$html = <<<HTML <div> <p>Paragraph one is okay </p> <h2>This will work without problem</h2> <p>Paragraph two is okay </p> <h2><a href="#">This heading has anchor</a></h2> <p>Paragraph one is okay </p> <h2> This heading start with space</h2> <p>Paragraph two is okay </p> <h3>1. <a href="https://www.example1.com/">This wont work</a></h3> <p>Paragraph one is okay </p> <h3>2. <a href="https://www.example2.com/">Not working</a></h3> <p>Paragraph two is okay </p> <h3>3. Neither this one</h3> <h3>But this works again</h3> </div> HTML; $dom = new DOMDocument; $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); $xpath = new DOMXPath($dom); foreach ($xpath->query("//h2 | //h3") as $node) { if (preg_match('~^\s*(?:\d+\.)?\s*\K\S+(?:\s+\S+){1,2}~', $node->nodeValue, $m)) { $node->setAttribute('id', str_replace(' ', '-', strtolower($m[0]))); } } echo $dom->saveHTML();

输出：<div>
<p>Paragraph one is okay </p>
<h2 id="this-will-work">This will work without problem</h2>
<p>Paragraph two is okay </p>
<h2 id="this-heading-has"><a href="#">This heading has anchor</a></h2>
<p>Paragraph one is okay </p>
<h2 id="this-heading-start">  This heading start with space</h2>
<p>Paragraph two is okay </p>
<h3 id="this-wont-work">1. <a href="https://www.example1.com/">This wont work</a></h3>
<p>Paragraph one is okay </p>
<h3 id="not-working">2. <a href="https://www.example2.com/">Not working</a></h3>
<p>Paragraph two is okay </p>
<h3 id="neither-this-one">3. Neither this one</h3>
<h3 id="but-this-works">But this works again</h3>
</div>

php regex attributes domparser html-heading

1个回答

0
投票

使用DOMDocument和它的好朋友XPath从有效的html中可靠地提取标题标记。

使用nodeValue()从标题标签的innerHTML生成无标签的字符串。（Demonstration of what nodeValue() generates）

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.