从表格中的每一行获取前三个<td>的文本(cheerio)

问题描述 投票:0回答:2

我想迭代每个 TR,然后使用 Cheerio (https://cheerio.js.org/) 将特定 TD 中的数据添加到新对象。每行包含 nametimestamplocation 的数据。我需要将每个数据中的数据添加到我的对象中。

文档提到:

.get([i]) 检索与 Cheerio 对象匹配的 DOM 元素。如果指定了索引,则检索与 Cheerio 对象匹配的元素之一

这是我可以使用的东西吗?


代码:
my_object: { number: tracking_number, checkpoints: [] }

$('.table-readonly.table-cargo-flow-road>tbody>tr').each((i, el) => {

  // Here we have access to each TR element. How do I get a specific **TD element** here?
  // pesudo code

  my_object.checkpoints.push({status: $(el).get(0).text(), location: $(el).get(2).text(), timestamp: $(el).get(1).text()})
})

从爬虫返回的数据:

  <div role="tabpanel" class="tab-pane active" id="cargo-flow-status">
   <table class="table-readonly table-cargo-flow-road">
    <thead>
    <tr>
        
        <th class="table-readonly__head-item statusname"
            data-tooltip-ellipsis data-title="Status Event">Status Event</th>
        <th class="table-readonly__head-item statusdatetime"
            data-tooltip-ellipsis data-title="Date Time">Date Time</th>
        <th class="table-readonly__head-item statuslocation"
            data-tooltip-ellipsis data-title="Location">Location</th>
        <th class="table-readonly__head-item exception"
            data-tooltip-ellipsis data-title="Exception">Exception</th>
        <th class="table-readonly__head-item remarks"
            data-tooltip-ellipsis data-title="Remark">Remark</th>
    </tr>
    </thead>

    <tbody>
    <tr>
        
        
        <td class="table-readonly__cell t-statusname statusname"
            data-tooltip-ellipsis data-title="Arrived SD OK">Arrived SD OK</td>
        <td class="table-readonly__cell t-statusdatetime statusdatetime"
            data-tooltip-ellipsis data-title="Jun 3, 2020 | 3:07 PM">
          Jun 3, 2020
          <em>3:07 PM</em>
        </td>
        <td class="table-readonly__cell t-statuslocation statuslocation"
            data-tooltip-ellipsis data-title="Hagen">Hagen</td>
        <td class="table-readonly__cell t-exception exception"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-remarks remarks"
            data-tooltip-ellipsis data-title="5800004568478">5800004568478</td>
    </tr>
    <tr>
        
        
        <td class="table-readonly__cell t-statusname statusname"
            data-tooltip-ellipsis data-title="Shipment loaded into linehaul">Shipment loaded into linehaul</td>
        <td class="table-readonly__cell t-statusdatetime statusdatetime"
            data-tooltip-ellipsis data-title="Jun 3, 2020 | 9:42 PM">
          Jun 3, 2020
          <em>9:42 PM</em>
        </td>
        <td class="table-readonly__cell t-statuslocation statuslocation"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-exception exception"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-remarks remarks"
            data-tooltip-ellipsis data-title="5800004568478">5800004568478</td>
    </tr>
    <tr>
        
        
        <td class="table-readonly__cell t-statusname statusname"
            data-tooltip-ellipsis data-title="Linehaul Departed">Linehaul Departed</td>
        <td class="table-readonly__cell t-statusdatetime statusdatetime"
            data-tooltip-ellipsis data-title="Jun 3, 2020 | 9:51 PM">
          Jun 3, 2020
          <em>9:51 PM</em>
        </td>
        <td class="table-readonly__cell t-statuslocation statuslocation"
            data-tooltip-ellipsis data-title="Hagen">Hagen</td>
        <td class="table-readonly__cell t-exception exception"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-remarks remarks"
            data-tooltip-ellipsis data-title="5800004568478">5800004568478</td>
    </tr>
    <tr>
        
        
        <td class="table-readonly__cell t-statusname statusname"
            data-tooltip-ellipsis data-title="Cartage Truck Departed">Cartage Truck Departed</td>
        <td class="table-readonly__cell t-statusdatetime statusdatetime"
            data-tooltip-ellipsis data-title="Jun 4, 2020 | 8:25 AM">
          Jun 4, 2020
          <em>8:25 AM</em>
        </td>
        <td class="table-readonly__cell t-statuslocation statuslocation"
            data-tooltip-ellipsis data-title="Langenhagen">Langenhagen</td>
        <td class="table-readonly__cell t-exception exception"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-remarks remarks"
            data-tooltip-ellipsis data-title="5800004568478">5800004568478</td>
    </tr>
    <tr>
        
        
        <td class="table-readonly__cell t-statusname statusname"
            data-tooltip-ellipsis data-title="Arrival at Delivery Point">Arrival at Delivery Point</td>
        <td class="table-readonly__cell t-statusdatetime statusdatetime"
            data-tooltip-ellipsis data-title="Jun 4, 2020 | 9:37 AM">
          Jun 4, 2020
          <em>9:37 AM</em>
        </td>
        <td class="table-readonly__cell t-statuslocation statuslocation"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-exception exception"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-remarks remarks"
            data-tooltip-ellipsis data-title="5800004568478       ANKUNFT BEIM ZUSTELLORT">5800004568478       ANKUNFT BEIM ZUSTELLORT</td>
    </tr>
    <tr>
        
        
        <td class="table-readonly__cell t-statusname statusname"
            data-tooltip-ellipsis data-title="Delivered to Consignee OK">Delivered to Consignee OK</td>
        <td class="table-readonly__cell t-statusdatetime statusdatetime"
            data-tooltip-ellipsis data-title="Jun 4, 2020 | 9:59 AM">
          Jun 4, 2020
          <em>9:59 AM</em>
        </td>
        <td class="table-readonly__cell t-statuslocation statuslocation"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-exception exception"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-remarks remarks"
            data-tooltip-ellipsis data-title="5800004568478                                               FRAU ALTMANN">5800004568478                                               FRAU ALTMANN</td>
    </tr>
    <tr>
        
        
        <td class="table-readonly__cell t-statusname statusname"
            data-tooltip-ellipsis data-title="POD Available">POD Available</td>
        <td class="table-readonly__cell t-statusdatetime statusdatetime"
            data-tooltip-ellipsis data-title="Jun 4, 2020 | 10:12 AM">
          Jun 4, 2020
          <em>10:12 AM</em>
        </td>
        <td class="table-readonly__cell t-statuslocation statuslocation"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-exception exception"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-remarks remarks"
            data-tooltip-ellipsis></td>
    </tr>
    </tbody>
</table>
                </div>
                <div role="tabpanel" class="tab-pane" id="information-flow-status">
                  <table class="table-readonly table-information-flow-road">
    <thead>
    <tr>
        
        <th class="table-readonly__head-item statusname"
            data-tooltip-ellipsis data-title="Status Event">Status Event</th>
        <th class="table-readonly__head-item statusdatetime"
            data-tooltip-ellipsis data-title="Date Time">Date Time</th>
        <th class="table-readonly__head-item statuslocation"
            data-tooltip-ellipsis data-title="Location">Location</th>
        <th class="table-readonly__head-item exception"
            data-tooltip-ellipsis data-title="Exception">Exception</th>
        <th class="table-readonly__head-item remarks"
            data-tooltip-ellipsis data-title="Remark">Remark</th>
    </tr>
    </thead>

    <tbody>
    <tr>
        
        
        <td class="table-readonly__cell t-statusname statusname"
            data-tooltip-ellipsis data-title="Booking Accepted">Booking Accepted</td>
        <td class="table-readonly__cell t-statusdatetime statusdatetime"
            data-tooltip-ellipsis data-title="Jun 3, 2020 | 1:04 PM">
          Jun 3, 2020
          <em>1:04 PM</em>
        </td>
        <td class="table-readonly__cell t-statuslocation statuslocation"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-exception exception"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-remarks remarks"
            data-tooltip-ellipsis data-title="5800004568478">5800004568478</td>
    </tr>
    <tr>
        
        
        <td class="table-readonly__cell t-statusname statusname"
            data-tooltip-ellipsis data-title="POD URL Available">POD URL Available</td>
        <td class="table-readonly__cell t-statusdatetime statusdatetime"
            data-tooltip-ellipsis data-title="Jun 4, 2020 | 11:15 PM">
          Jun 4, 2020
          <em>11:15 PM</em>
        </td>
        <td class="table-readonly__cell t-statuslocation statuslocation"
            data-tooltip-ellipsis data-title="Bad Hersfeld">Bad Hersfeld</td>
        <td class="table-readonly__cell t-exception exception"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-remarks remarks"
            data-tooltip-ellipsis></td>
    </tr>
    <tr>
        
        
        <td class="table-readonly__cell t-statusname statusname"
            data-tooltip-ellipsis data-title="Invoice/Credit Received in KNLogin">Invoice/Credit Received in KNLogin</td>
        <td class="table-readonly__cell t-statusdatetime statusdatetime"
            data-tooltip-ellipsis data-title="Jul 2, 2020 | 6:07 AM">
          Jul 2, 2020
          <em>6:07 AM</em>
        </td>
        <td class="table-readonly__cell t-statuslocation statuslocation"
            data-tooltip-ellipsis data-title="Hagen">Hagen</td>
        <td class="table-readonly__cell t-exception exception"
            data-tooltip-ellipsis></td>
        <td class="table-readonly__cell t-remarks remarks"
            data-tooltip-ellipsis data-title="075510533">075510533</td>
    </tr>
    </tbody>
  </table>
</div>
javascript cheerio
2个回答
0
投票

现有答案是正确的,但这里有一个更精确和完整的版本,它将日期和时间分开:

const cheerio = require("cheerio"); // 1.0.0-rc.12

const html = `<HTML copied from the original post>`;

const $ = cheerio.load(html);
const data = [...$("#cargo-flow-status tr")].map(e => ({
  name: $(e).find(".statusname").text(),
  location: $(e).find(".statuslocation").text(),
  time: $(e).find(".statusdatetime em").text(),
  date: $(e)
    .find(".statusdatetime")
    .contents()
    .first()
    .text()
    .trim(),
}));
console.log(data);

0
投票

我找到了答案:

.find(node)

获取当前匹配元素集中每个元素的后代,通过选择器、jQuery 对象或元素进行过滤。

  $('.table-readonly.table-cargo-flow-road>tbody>tr').each((i, el) => {
      // (el) is each TR element. We then use find and pass in the classname

      $(el).find('.statusname').text()
      $(el).find('.statuslocation').text()
      $(el).find('.statusdatetime').text()
  })
© www.soinside.com 2019 - 2024. All rights reserved.