如何获取站点需要加载cheerio的部分数据?

问题描述 投票:0回答:1

我想用cheerio获取这个站点的行数,但是因为需要加载站点,它只显示了前10行。我怎样才能得到这个表的所有行? coinmarketcap.com

在这个网站上,第一页的表格有100行。我需要获取所有这些100行的信息,但是我写的这段代码只给出了前10。因为我加载站点的时候,第一时间只显示前10个,其余的加载后显示。

const express = require("express");
const axios = require("axios");
const cheerio = require("cheerio");

let PORT = 8000;
let links = "https://coinmarketcap.com";

const app = express();

axios.get(link).then((response) => {
  const html = response.data;
  const $ = cheerio.load(html);

  $(".coin-logo").each(function (i) {
    console.log($(this).attr("src"), i);
  });
});

app.listen(PORT, () => console.log(`server is running on PORT: ${PORT}`));

在控制台里

server is running on PORT: 8000
https://s2.coinmarketcap.com/static/img/coins/64x64/1.png 0
https://s2.coinmarketcap.com/static/img/coins/64x64/1027.png 1
https://s2.coinmarketcap.com/static/img/coins/64x64/825.png 2
https://s2.coinmarketcap.com/static/img/coins/64x64/1839.png 3
https://s2.coinmarketcap.com/static/img/coins/64x64/3408.png 4
https://s2.coinmarketcap.com/static/img/coins/64x64/52.png 5
https://s2.coinmarketcap.com/static/img/coins/64x64/2010.png 6
https://s2.coinmarketcap.com/static/img/coins/64x64/3890.png 7
https://s2.coinmarketcap.com/static/img/coins/64x64/74.png 8
https://s2.coinmarketcap.com/static/img/coins/64x64/5426.png 9

只返回前十行。而表有 100 行。

node.js api axios cheerio
1个回答
1
投票

这是一个 React/Next.js 应用程序,这意味着数据不在 axios 请求的静态 HTML 中,它是在页面加载后由 JS 添加到 DOM 中的。单页应用程序 (SPA) 的数据通常通过 API 端点传入,如果不安全,您通常可以直接访问该端点。

在这种情况下,数据(幸运的是)在

<script id="__NEXT_DATA__">
中,在页面加载后由 JS 使用它来创建您在开发工具中看到的可见元素。可以得到如下数据:

const axios = require("axios");
const cheerio = require("cheerio");
require("util").inspect.defaultOptions.depth = null;

const url = "<Your URL>";

axios.get(url).then(response => {
  const html = response.data;
  const $ = cheerio.load(html);
  const payload = $("#__NEXT_DATA__").first().text();
  const {data} = JSON.parse(JSON.parse(payload).props.initialState)
    .cryptocurrency.listingLatest;
  console.log(data);
});

结构被压缩并且没有标题。如果您想将标头映射到数据以使其更具可读性,您可以:

const payload = $("#__NEXT_DATA__").first().text();
const {data} = JSON.parse(
  JSON.parse(payload).props.initialState
).cryptocurrency.listingLatest;
const [{keysArr}, ...rest] = data;
const withKeys = rest.map(e =>
  Object.fromEntries(
    e.map((e, i) => [keysArr[i] ?? "unknown", e])
  )
);
console.log(withKeys.slice(0, 10));

现在,这里的代码显示类似于网站前几列的数据:

const summary = withKeys.map(e => ({
  "id": e.id,
  "name": e.name,
  "symbol": e.symbol,
  "price": e["quote.USD.price"],
  "1h": e["quote.USD.percentChange1h"],
  "24h": e["quote.USD.percentChange24h"],
  "marketCap": e["quote.USD.marketCap"],
}));
console.log(summary);
console.log(summary.length); // => 100

输出:

[
  {
    id: 1,
    name: 'Bitcoin',
    symbol: 'BTC',
    price: 28422.366435538406,
    '1h': -0.02014955,
    '24h': 5.28633725,
    marketCap: 549443765021.2035
  },
  {
    id: 1027,
    name: 'Ethereum',
    symbol: 'ETH',
    price: 1809.5420505966479,
    '1h': 0.00783376,
    '24h': 3.55526047,
    marketCap: 221440656815.19766
  },
  {
    id: 825,
    name: 'Tether',
    symbol: 'USDT',
    price: 0.9998792896202411,
    '1h': -0.00074249,
    '24h': -0.02658359,
    marketCap: 79511295635.83586
  },
  // ...
]
© www.soinside.com 2019 - 2024. All rights reserved.