为什么 Nokogiri 的 to_xhtml 从 `name` 创建新的 `id` 属性？

Question

考虑以下代码：

require 'nokogiri' # v1.5.2
doc = Nokogiri.XML('<body><a name="foo">ick</a></body>')

puts doc.to_html
#=> <body><a name="foo">ick</a></body>

puts doc.to_xml
#=> <?xml version="1.0"?>
#=> <body>
#=>   <a name="foo">ick</a>
#=> </body>

 puts doc.to_xhtml
 #=> <body>
 #=>   <a name="foo" id="foo">ick</a>
 #=> </body>

注意已创建的新

id

属性。

Answer 1

显然这是 libxml2 的一个特性。在http://www.w3.org/TR/xhtml1/#h-4.10我们发现：

在 XML 中，片段标识符是
ID
类型，每个元素只能有一个
ID
类型的属性。因此，在 XHTML 1.0 中，
id
属性被定义为
ID
类型。为了确保 XHTML 1.0 文档是结构良好的 XML 文档，XHTML 1.0 文档必须在上面列出的元素上定义片段标识符
时使用
id属性。
[...]
请注意，在 XHTML 1.0 中，这些元素的
name
属性已被正式弃用，并将在 XHTML 的后续版本中删除。

我想出的最好的“解决方法”是：

# Destroy all <a name="..."> elements, replacing with children
# if another element with a conflicting id already exists in the document
doc.xpath('//a[@name][not(@id)][not(@href)]').each do |a|
  a.replace(a.children) if doc.at_css("##{a['name']}")
end

Answer 2

也许您可以向这些元素添加一些其他

id

值以防止 libxml 添加它自己的。

doc.xpath('//a[@name and not(@id)]').each do |n|
  n['id'] = n['name'] + 'some_suffix'
end

（显然，您需要确定如何为您的文档创建唯一的

id

值）。

为什么 Nokogiri 的 to_xhtml 从 `name` 创建新的 `id` 属性？

问题描述投票：0回答：2

2个回答

最新问题

为什么 Nokogiri 的 to_xhtml 从 `name` 创建新的 `id` 属性？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2