如何使用Nokogiri替换文本字符串的“ inner_html”

Question

我想获取一个HTML字符串并返回一个保留HTML结构但文本/内部HTML混淆的变异版本。

例如：

string = "<div><p><h1>this is some sensitive text</h1><br></p><p>more text</p></div>"
obfuscate_html_string(string)
=> "<div><p><h1>**** **** **** **** ****</h1><br></p><p>**** ****</p></div>"

我进行了实验，虽然inner_html=方法似乎很有用，但会引发参数错误：

Nokogiri::HTML.fragment(value).traverse { |node| node.content = '***' if node.inner_html }.to_s
=> "***"

Nokogiri::HTML.fragment(value).traverse { |node| node.content ? node.content = '***' : node.to_html }.to_s
=> "***"

Nokogiri::HTML.fragment(value).traverse { |node| node.inner_html = '***' if node.inner_html }.to_s
=> ArgumentError: cannot reparent Nokogiri::XML::Text there

Answer 1

这应该有帮助，但是文档会对此进行更详细的介绍。

您的HTML出现问题，因为它无效，这迫使Nokogiri进行修复，这时将要更改HTML：

require 'nokogiri'

doc = Nokogiri::HTML("<div><p><h1>this is some sensitive text</h1><br></p><p>more text</p></div>")
doc.errors # => [#<Nokogiri::XML::SyntaxError: 1:53: ERROR: Unexpected end tag : p>]
doc.to_html
# => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n" +
#    "<html><body><div>\n" +
#    "<p></p>\n" +
#    "<h1>this is some sensitive text</h1>\n" +
#    "<br><p>more text</p>\n" +
#    "</div></body></html>\n"

Nokogiri报告HTML出现错误：

ERROR: Unexpected end tag : p>

这意味着它无法理解HTML，并通过提供/更改结束标签来尽最大努力进行恢复，直到对它有意义为止。这并不意味着HTML实际上就是您所想要的，或者不是作者想要的。

从那时起，您的尝试查找节点很可能会失败，因为DOM已更改。

ALWAYS检查errors，如果它不为空，请非常小心。

尽管从这一点来看，这应该起作用：

node = doc.at('div h1')
node.inner_html = node.inner_html.tr('a-z', '*')

node = doc.search('div p')[1]
node.inner_html = node.inner_html.tr('a-z', '*')

puts doc.to_html

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><div>
# >> <p></p>
# >> <h1>**** ** **** ********* ****</h1>
# >> <br><p>**** ****</p>
# >> </div></body></html>

如何使用Nokogiri替换文本字符串的“ inner_html”

问题描述投票：0回答：1

1个回答

最新问题

如何使用Nokogiri替换文本字符串的“ inner_html”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1