为什么HTML :: fragment不起作用而:: XML起作用？

Question

我一直在尝试对某些HTML片段进行一些处理，在使用Nokogiri时，我遇到了一个我似乎无法理解的问题。

我正在使用此代码：

doc = Nokogiri::HTML::fragment xml

doc.search("//text()[contains(.,'#{@wordbefore}')]").each do |node|
  node.replace(node.content.gsub(/#{@wordbefore}/, ''))
end

使用该代码，整个块被跳过。但是，如果我使用：

doc = Nokogiri::XML xml

它确实起作用。我一直在试图弄清为什么是这样和不能。因为我真的在传递代码片段，并且我真的不想编码内部元素，也不希望每个片段上的XML名称空间都被编码，所以我真的很想将其保留为HTML :: fragment。但是无法确定这是我遇到的错误还是只是我做错了什么。

更新：这是我从头开始设置要进行测试的全部内容。另一注。我意识到这将杀死term元素中的内容。实际上，该部分在不同的阶段运行时并不存在，但这是我获取真实内容的最简单方法。

xml = <<-EOXML
<p dir="ltr" class="FM_Body">The Cortex-A5 MPCore processor is a high-performance, low-power, ARM macrocell with an L1 cache subsystem that provides full virtual memory capabilities. Up to four individual cores can be linked in a cache-coherent cluster, under the control of a <term>Snoop Control Unit</term> (SCU), that maintains L1 data cache coherency for memory marked as shared. The Cortex-A5 MPCore processor implements the ARMv7 architecture and runs 32-bit ARM instructions, 16-bit and 32-bit Thumb instructions, and 8-bit Java<tm tmtype="tm">Java</tm> bytecodes in Jazelle state.</p>
EOXML

doc = Nokogiri::XML xml
@wordbefore = "Java"

doc.search("//text()[contains(.,'#{@wordbefore}')]").each do |node|
 node.replace(node.content.gsub(/#{@wordbefore}/, ''))
end

p doc.to_xml

Answer 1

维护Nokogiri的人回到我身边。事实证明，这是HTML片段和Xpath搜索的已知问题。在我的特殊情况下，解决方案是通过.//text()而不是仅通过//text()爬上树。

为什么HTML :: fragment不起作用而:: XML起作用？

问题描述投票：0回答：1

1个回答

最新问题

为什么HTML :: fragment不起作用而:: XML起作用？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1