我想用cheerio获取这个站点的行数,但是因为需要加载站点,它只显示了前10行。我怎样才能得到这个表的所有行? coinmarketcap.com
在这个网站上,第一页的表格有100行。我需要获取所有这些100行的信息,但是我写的这段代码只给出了前10。因为我加载站点的时候,第一时间只显示前10个,其余的加载后显示。
const express = require("express");
const axios = require("axios");
const cheerio = require("cheerio");
let PORT = 8000;
let links = "https://coinmarketcap.com";
const app = express();
axios.get(link).then((response) => {
const html = response.data;
const $ = cheerio.load(html);
$(".coin-logo").each(function (i) {
console.log($(this).attr("src"), i);
});
});
app.listen(PORT, () => console.log(`server is running on PORT: ${PORT}`));
在控制台里
server is running on PORT: 8000
https://s2.coinmarketcap.com/static/img/coins/64x64/1.png 0
https://s2.coinmarketcap.com/static/img/coins/64x64/1027.png 1
https://s2.coinmarketcap.com/static/img/coins/64x64/825.png 2
https://s2.coinmarketcap.com/static/img/coins/64x64/1839.png 3
https://s2.coinmarketcap.com/static/img/coins/64x64/3408.png 4
https://s2.coinmarketcap.com/static/img/coins/64x64/52.png 5
https://s2.coinmarketcap.com/static/img/coins/64x64/2010.png 6
https://s2.coinmarketcap.com/static/img/coins/64x64/3890.png 7
https://s2.coinmarketcap.com/static/img/coins/64x64/74.png 8
https://s2.coinmarketcap.com/static/img/coins/64x64/5426.png 9
这是一个 React/Next.js 应用程序,这意味着数据不在 axios 请求的静态 HTML 中,它是在页面加载后由 JS 添加到 DOM 中的。单页应用程序 (SPA) 的数据通常通过 API 端点传入,如果不安全,您通常可以直接访问该端点。
在这种情况下,数据(幸运的是)在
<script id="__NEXT_DATA__">
中,在页面加载后由 JS 使用它来创建您在开发工具中看到的可见元素。可以得到如下数据:
const axios = require("axios");
const cheerio = require("cheerio");
require("util").inspect.defaultOptions.depth = null;
const url = "<Your URL>";
axios.get(url).then(response => {
const html = response.data;
const $ = cheerio.load(html);
const payload = $("#__NEXT_DATA__").first().text();
const {data} = JSON.parse(JSON.parse(payload).props.initialState)
.cryptocurrency.listingLatest;
console.log(data);
});
结构被压缩并且没有标题。如果您想将标头映射到数据以使其更具可读性,您可以:
const payload = $("#__NEXT_DATA__").first().text();
const {data} = JSON.parse(
JSON.parse(payload).props.initialState
).cryptocurrency.listingLatest;
const [{keysArr}, ...rest] = data;
const withKeys = rest.map(e =>
Object.fromEntries(
e.map((e, i) => [keysArr[i] ?? "unknown", e])
)
);
console.log(withKeys.slice(0, 10));
现在,这里的代码显示类似于网站前几列的数据:
const summary = withKeys.map(e => ({
"id": e.id,
"name": e.name,
"symbol": e.symbol,
"price": e["quote.USD.price"],
"1h": e["quote.USD.percentChange1h"],
"24h": e["quote.USD.percentChange24h"],
"marketCap": e["quote.USD.marketCap"],
}));
console.log(summary);
console.log(summary.length); // => 100
输出:
[
{
id: 1,
name: 'Bitcoin',
symbol: 'BTC',
price: 28422.366435538406,
'1h': -0.02014955,
'24h': 5.28633725,
marketCap: 549443765021.2035
},
{
id: 1027,
name: 'Ethereum',
symbol: 'ETH',
price: 1809.5420505966479,
'1h': 0.00783376,
'24h': 3.55526047,
marketCap: 221440656815.19766
},
{
id: 825,
name: 'Tether',
symbol: 'USDT',
price: 0.9998792896202411,
'1h': -0.00074249,
'24h': -0.02658359,
marketCap: 79511295635.83586
},
// ...
]