如何让Nokogiri显示节点,而不仅仅是HTML

问题描述 投票:0回答:1

现在,当我解析一些HTML(例如黑客新闻的头版)时,它工作正常。我可以打电话给像doc = Nokogiri::HTML(open('news.ycombinator.com'))这样的课,我会回来Nokogiri::HTML::Document < Nokogiri::XML::Document

问题是,在终端中,我看到HTML而不是实际的Nokogiri元素。我想看到它,因为它向我展示了有价值的信息,如Nokogiri Elements Children,或一系列链接或或或。

我使用Watir Gem使用以下方法获取HTML:

[1] pry(main)> browser = Watir::Browser.new(:firefox)
#<Watir::Browser:0x2c5654b29ef00c22 url="about:blank" title="">
[2] pry(main)> browser.goto('news.ycombinator.com')
"http://news.ycombinator.com"
[3] pry(main)> browser.html

browser.html是一个包含未解析的HTML的实例变量(我认为?)。

如果我打电话给doc = Nokogiri::HTML.parse(browser.html),这就是我现在回来的

enter image description here

以下是我想要回复的内容:

enter image description here

我哪里错了?

根据要求添加原始代码:

Nokogiri::HTML::Document < Nokogiri::XML::Document
[31] pry(main)> doc = Nokogiri::HTML.parse(browser.html)
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html op="news">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="referrer" content="origin">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" type="text/css" href="news.css?stXbi7LCyutClfTUMe1b">
            <link rel="shortcut icon" href="favicon.ico">
          <link rel="alternate" type="application/rss+xml" title="RSS" href="rss">
        <title>Hacker News</title>
</head>
<body>
<center><table id="hnmain" width="85%" cellspacing="0" cellpadding="0" border="0" bgcolor="#f6f6ef">
        <tbody>
<tr><td bgcolor="#ff6600"><table style="padding:2px" width="100%" cellspacing="0" cellpadding="0" border="0"><tbody><tr>
<td style="width:18px;padding-right:4px"><a href="https://news.ycombinator.com"><img src="y18.gif" style="border:1px white solid;" width="18" height="18"></a></td>
                  <td style="line-height:12pt; height:10px;"><span class="pagetop"><b class="hnname"><a href="news">Hacker News</a></b>
              <a href="newest">new</a> | <a href="front">past</a> | <a href="newcomments">comments</a> | <a href="ask">ask</a> | <a href="show">show</a> | <a href="jobs">jobs</a> | <a href="submit">submit</a>            </span></td>
<td style="text-align:right;padding-right:4px;"><span class="pagetop">
                              <a href="login?goto=news">login</a>
                          </span></td>
              </tr></tbody></table></td></tr>
<tr id="pagespace" title="" style="height:10px"></tr>
<tr><td>
<table class="itemlist" cellspacing="0" cellpadding="0" border="0">
              <tbody>
<tr class="athing" id="19388248">
      <td class="title" valign="top" align="right"><span class="rank">1.</span></td>      <td class="votelinks" valign="top"><center><a id="up_19388248" href="vote?id=19388248&amp;how=up&amp;goto=news"><div class="votearrow" title="upvote"></div></a></center></td>
<td class="title">
<a href="https://www.bennettnotes.com/post/getting-too-absorbed-into-your-side-projects/" class="storylink">Getting Too Absorbed in Your Side Projects</a><span class="sitebit comhead"> (<a href="from?site=bennettnotes.com"><span class="sitestr">bennettnotes.com</span></a>)</span>
</td>
</tr>
<tr>
<td colspan="2"></td>
<td class="subtext">
        <span class="score" id="score_19388248">42 points</span> by <a href="user?id=_davebennett" class="hnuser">_davebennett</a> <span class="age"><a href="item?id=19388248">1 hour ago</a></span> <span id="unv_19388248"></span> | <a href="hide?id=19388248&amp;goto=news">hide</a> | <a href="item?id=19388248">27 comments</a>              </td>
</tr>
      <tr class="spacer" style="height:5px"></tr>
                <tr class="athing" id="19384878">
      <td class="title" valign="top" align="right"><span class="rank">2.</span></td>      <td class="votelinks" valign="top"><center><a id="up_19384878" href="vote?id=19384878&amp;how=up&amp;goto=news"><div class="votearrow" title="upvote"></div></a></center></td>
<td class="title">
<a href="https://www.nytimes.com/2019/03/13/technology/facebook-data-subpoenas.html" class="storylink">Facebook’s Data Deals Are Under Criminal Investigation</a><span class="sitebit comhead"> (<a href="from?site=nytimes.com"><span class="sitestr">nytimes.com</span></a>)</span>
</td>
</tr>
<tr>
<td colspan="2"></td>
<td class="subtext">
        <span class="score" id="score_19384878">661 points</span> by <a href="user?id=tysone" class="hnuser">tysone</a> <span class="age"><a href="item?id=19384878">13 hours ago</a></span> <span id="unv_19384878"></span> | <a href="hide?id=19384878&amp;goto=news">hide</a> | <a href="item?id=19384878">156 comments</a>              </td>
</tr>
      <tr class="spacer" style="height:5px"></tr>
                <tr class="athing" id="19388091">
      <td class="title" valign="top" align="right"><span class="rank">3.</span></td>      <td class="votelinks" valign="top"><center><a id="up_19388091" href="vote?id=19388091&amp;how=up&amp;goto=news"><div class="votearrow" title="upvote"></div></a></center></td>
<td class="title">
<a href="https://krita.org/en/item/krita-4-2-0-the-first-painting-application-to-bring-hdr-support-to-windows" class="storylink">Krita 4.2.0: First painting application with HDR support on Windows</a><span class="sitebit comhead"> (<a href="from?site=krita.org"><span class="sitestr">krita.org</span></a>)</span>
</td>
...
ruby nokogiri
1个回答
0
投票

这听起来像你想要的:

doc = Nokogiri::HTML browser.html
© www.soinside.com 2019 - 2024. All rights reserved.