Puppeteer 错误：ProtocolError：协议错误（Target.createTarget）：目标已关闭

Question

我正在尝试使用在 MeteorJs Galaxy 上运行的 Puppeteer 从特定的 YouTube 频道抓取 YouTube Shorts。

这是我到目前为止所做的代码：

import puppeteer from 'puppeteer';
import { YouTubeShorts } from '../imports/api/youTubeShorts'; //meteor mongo local instance

let URL = 'https://www.youtube.com/@ummahtoday1513/shorts'

const processShortsData = (iteratedData) => {
    let documentExist = YouTubeShorts.findOne({ videoId:iteratedData.videoId })
    if(documentExist === undefined) {  //undefined meaning this incoming shorts in a new one
        YouTubeShorts.insert({
            videoId: iteratedData.videoId,
            title: iteratedData.title,
            thumbnail: iteratedData.thumbnail,
            height: iteratedData.height,
            width: iteratedData.width
        })
    }
}

const fetchShorts = () => {
        puppeteer.launch({
            headless:true,
            args:[
                '--no-sandbox',
                '--disable-setuid-sandbox',
                '--disable-dev-shm-usage',
                '--single-process'
            ]
        })
        .then( async function(browser){
            async function fetchingData(){
                new Promise(async function(resolve, reject){
                    const page = await browser.newPage();
                
                    await Promise.all([
                        await page.setDefaultNavigationTimeout(0),
                        await page.waitForNavigation({waitUntil: "domcontentloaded"}),
                        await page.goto(URL, {waitUntil:["domcontentloaded", "networkidle2"]}),
                        await page.waitForSelector('ytd-rich-grid-slim-media', { visible:true }),
                        new Promise(async function(resolve,reject){
                            page.evaluate(()=>{
                                const trialData = document.getElementsByTagName('ytd-rich-grid-slim-media');
                                const titles = Array.from(trialData).map(i => {
                                    const singleData = {
                                        videoId: i.data.videoId,
                                        title: i.data.headline.simpleText,
                                        thumbnail: i.data.thumbnail.thumbnails[0].url,
                                        height: i.data.thumbnail.thumbnails[0].height,
                                        width: i.data.thumbnail.thumbnails[0].width,
                                    }
                                    return singleData
                                })
                                resolve(titles);
                            })
                        }),
                    ])
                    await page.close()
                })
                await browser.close()
            }

            async function fetchAndProcessData(){
                const datum = await fetchingData()
                console.log('DATUM:', datum)
            }
            await fetchAndProcessData()
        })
}

fetchShorts();

我在这里纠结于两件事：

异步、等待和承诺，以及

寻找 Puppeteer 在控制台输出

ProtocolError: Protocol error (Target.createTarget): Target closed.

错误的原因。

我是 puppeteer 的新手，并尝试从 StackOverflow 和 Google 上的各种示例中学习，但我仍然无法正确使用它。

Answer 1

一般性建议：慢慢编码并经常测试，尤其是当您处于不熟悉的领域时。尽量减少问题，以便您了解失败的原因。这里有很多问题，给人的感觉是代码是一蹴而就的，没有增量验证。没有明显的调试入口点。

让我们检查一些失败的模式。

首先，在使用像 Puppeteer 这样基于承诺的 API 时，基本上不要使用

new Promise()

。这在规范中进行了讨论什么是显式承诺构造反模式以及如何避免它？所以我将避免在那里重复答案。

其次，不要混合

async

/

await

和

then

。承诺的要点是扁平化代码并避免厄运金字塔。如果您发现您有 5-6 个深度嵌套的函数，那么您就是在误用承诺。在 Puppeteer 中，基本不需要

then

.

第三，将超时设置为无穷大

page.setDefaultNavigationTimeout(0)

抑制错误。如果您想要长时间延迟，那很好，但是如果导航花费的时间超过几分钟，则说明出了点问题，您想要一个错误，以便您可以理解和调试它，而不是让脚本静静地等待直到您杀死它，没有关于哪里出了问题或失败的明确诊断。

第四，注意毫无意义的呼叫

waitForNavigation

。这样的代码没有多大意义：

await page.waitForNavigation(...);
await page.goto(...);

你还在等什么导航？这似乎适合触发超时，或者更糟的是，在您将导航设置为永不超时后无限挂起。

第五，避免过早的抽象。你有各种辅助函数，但你还没有建立功能上正确的代码，所以这些只会增加事情的混乱状态。从正确性开始，然后在切入点变得明显时添加抽象。

第六，当数组的所有内容按顺序

Promise.all()

ed时，避免

await

。换句话说：

await Promise.all([
  await foo(),
  await bar(),
  await baz(),
  await quux(),
  garply(),
]);

等同于：

await foo();
await bar();
await baz();
await quux();
await garply();

第七，如果你有承诺，一定要回报：

const fetchShorts = () => {
  puppeteer.launch({
  // ..

应该是：

const fetchShorts = () => {
  return puppeteer.launch({
  // ..

这样，调用者可以

await

函数的完成。没有它，它就会被发射到虚空，永远无法连接到调用者的流程。

Eigth，

evaluate

无法访问 Node 中的变量，因此此模式不起作用：

new Promise(resolve => {
  page.evaluate(() => resolve());
});

相反，避免使用新的承诺反模式并使用 Puppeteer 已经返回给您的承诺：

await page.evaluate(() => {});

更好的是，在这里使用

$$eval

，因为它是

evaluate

中首先选择元素的常见模式的抽象。

将所有这些放在一起，这是一个重写：

const puppeteer = require("puppeteer"); // ^19.6.3

const url = "<Your URL>";

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.goto(url, {waitUntil: "domcontentloaded"});
  await page.waitForSelector("ytd-rich-grid-slim-media");
  const result = await page.$$eval("ytd-rich-grid-slim-media", els =>
    els.map(({data}) => ({
      videoId: data.videoId,
      title: data.headline.simpleText,
      thumbnail: data.thumbnail.thumbnails[0].url,
      height: data.thumbnail.thumbnails[0].height,
      width: data.thumbnail.thumbnails[0].width,
    }))
  );
  console.log(result);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

请注意，我使用

finally

确保浏览器清理，以便在代码抛出时进程不会挂起。

现在，我们想要的只是一点文本，因此加载 YouTube 下载的许多额外内容没有任何意义。您可以通过阻止任何与您的目标无关的内容来加速脚本：

const [page] = await browser.pages();

await page.setRequestInterception(true);
page.on("request", req => {
  if (
    req.url().startsWith("https://www.youtube.com") &&
    ["document", "script"].includes(req.resourceType())
  ) {
    req.continue();
  }
  else {
    req.abort();
  }
});
// ...

如果您准备好分解一个函数，您可以：

const puppeteer = require("puppeteer"); // ^19.6.3

const fetchShorts = async () => {
  const url = "<Your URL>";
  let browser;

  try {
    browser = await puppeteer.launch();
    const [page] = await browser.pages();
    await page.goto(url, {waitUntil: "domcontentloaded"});
    await page.waitForSelector("ytd-rich-grid-slim-media");
    return await page.$$eval("ytd-rich-grid-slim-media", els =>
      els.map(({data}) => ({
        videoId: data.videoId,
        title: data.headline.simpleText,
        thumbnail: data.thumbnail.thumbnails[0].url,
        height: data.thumbnail.thumbnails[0].height,
        width: data.thumbnail.thumbnails[0].width,
      }))
    );
  }
  finally {
    await browser?.close();
  }
};

fetchShorts()
  .then(shorts => console.log(shorts))
  .catch(err => console.error(err));

但是请记住，让负责管理浏览器资源的功能阻碍其可重用性并大大降低它的速度。我通常让调用者处理浏览器并让我所有的抓取助手接受一个

page

参数：

const fetchShorts = async page => {
  const url = "https://www.youtube.com/@ummahtoday1513/shorts";
  await page.goto(url, {waitUntil: "domcontentloaded"});
  await page.waitForSelector("ytd-rich-grid-slim-media");
  return await page.$$eval("ytd-rich-grid-slim-media", els =>
    els.map(({data}) => ({
      videoId: data.videoId,
      title: data.headline.simpleText,
      thumbnail: data.thumbnail.thumbnails[0].url,
      height: data.thumbnail.thumbnails[0].height,
      width: data.thumbnail.thumbnails[0].width,
    }))
  );
};

(async () => {
  let browser;

  try {
    browser = await puppeteer.launch();
    const [page] = await browser.pages();
    console.log(await fetchShorts(page));
  }
  catch (err) {
    console.error(err);
  }
  finally {
    await browser?.close();
  }
})();

Puppeteer 错误：ProtocolError：协议错误（Target.createTarget）：目标已关闭

问题描述投票：0回答：1

1个回答

最新问题

Puppeteer 错误：ProtocolError：协议错误（Target.createTarget）：目标已关闭

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1