使用操纵符爬行动态站点

Question

我正在尝试构建一个简单的刮板，该刮板将刮除Trailblazer Profile网站。我想获取用户的徽章和积分的数量。

因此，我正在使用cheerio和puppeteer来完成此任务。

这是我的代码->

 .get("/:profile", (req,res,next) => {

  const url = "https://trailblazer.me/id/hverma99";

  async function getPage(url) {
    const browser = await puppeteer.launch({headless: true});
    const page = await browser.newPage();
    await page.goto(url, {waitUntil: 'networkidle0'});

    const html = await page.content(); // serialized HTML of page DOM.
    await browser.close();
    return html;
  }

  const html = getPage(url);
  const $ = cheerio.load(html);
  const span = $('.tds-tally__count.tds-tally__count_success');
  console.log(span.text());

});

由于我正在对此进行测试，因此暂时没有使用profile参数。

问题：每当我运行此代码时，控制台上都不会打印任何内容，如果我尝试不使用puppeteer，那么我只会得到没有任何数据的html。我的预期结果是徽章和点数。

让我知道这段代码出了什么问题。

谢谢

Answer 1

一切都正确。您要做的就是await您的getPage通话，因为它是异步的。试试这个

.get("/:profile", async (req,res,next) => {

  const url = "https://trailblazer.me/id/hverma99";

  async function getPage(url) {
    const browser = await puppeteer.launch({headless: true});
    const page = await browser.newPage();
    await page.goto(url, {waitUntil: 'networkidle0'});

    const html = await page.content(); // serialized HTML of page DOM.
    await browser.close();
    return html;
  }

  const html = await getPage(url);
  const $ = cheerio.load(html);
  const span = $('.tds-tally__count.tds-tally__count_success');
  console.log(span.text());

});

还需要这样放置async-async (req,res,next)

使用操纵符爬行动态站点

问题描述投票：0回答：1

1个回答

最新问题

使用操纵符爬行动态站点

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1