对大型数组中的每个项目执行多次计时操作的最佳方法

Question

我的 JavaScript 代码中有一个函数，它循环遍历数组并对数组的每个项目执行一些耗时的操作。当数组中的项目数量较少时，它目前工作正常，但我也希望代码在数组较大时也能工作。这是我的功能：

const fetchAndProcessNews = async (queryString, from) => {
  const query = {
    queryString,
    from,
    size: 1,
  }
  try {
    console.log('Fetching news...')
    const { articles } = await searchApi.getNews(query)
    console.log('total articles fetched:', articles.length)
    console.log('Fetched news:', articles)
    if (articles && articles.length > 0) {
      console.log('Processing news...')
      //looping through all the articles fetched from api
      for (const article of articles) {
        console.log('Processing article with name: ', article.title)
        const { title, sourceUrl, id, publishedAt } = article
        //scraping content from the source url and returning the markup of the single article
        const markup = await scraper(sourceUrl)
        //using gpt to perform some tasks on the markup returned from scrapping
        const data = await askGpt(markup)
        //using dall e to generate an image
        const generatedImageUrl = await generateImg(data?.imageDescription)
        //downloading the image from the url and uploading it to s3
        const s3ImageUrl = await generateImgUrl(generatedImageUrl, title, id)
        //uploading the article to strapi using post request
        const newTitle = data?.title
        const newMarkup = data?.content
        const description = data?.abstract
        const categories = data?.categories

        console.log('pushing article to strapi')
        await createPost(
          newTitle,
          description,
          newMarkup,
          s3ImageUrl,
          publishedAt,
          categories
        )
        console.log('article processsing completed...')
      }
    } else {
      console.log('No articles found')
    }
  } catch (error) {
    console.error('Error fetching news:', error.message)
  }
}

让我解释一下我在做什么，我从 api 获取一些新闻文章，并且对于每一篇文章我都会执行这些任务：

使用 Cheerio 使用 api 提供的 URL 抓取内容，这需要一些时间
使用开放AI在标记上执行一些任务，这也需要花费大量时间。
使用 dall e 生成图像，这也需要时间
然后我将图像上传到s3
然后我使用 post-request 将所有内容上传到 Strapi

现在我担心如果文章数量是 100 或 1000，这段代码将如何工作？它能够处理所有这些耗时的任务吗？如何使其更加优化，使其不崩溃并正常工作？我没有那么多经验，所以我有点担心。我应该使用什么技术？我应该使用某种队列，如 bull js 或批处理吗？如果有人可以提供详细的答案，这将是一个很大的帮助。

Answer 1

这个想法是不要等到文章处理的所有步骤完成才开始处理下一篇文章。

例如，您可以每隔 100 毫秒开始阅读下一篇文章。您甚至可以立即启动它们，但风险是您可能会向同一服务器发出太多请求，从而达到某些服务器限制。因此，在文章处理启动之间稍微延迟可能更为谨慎。要获得这样的中间延迟，您可以使用这个通用函数：

const delay = ms => new Promise(resolve => setTimeout(resolve, ms));

实现整体想法的最简单方法是将文章处理的代码放在单独的函数中：

const processArticle = async (article) = {
    console.log('Processing article with name: ', article.title)
    const { title, sourceUrl, id, publishedAt } = article
    //scraping content from the source url and returning the markup of the single article
    const markup = await scraper(sourceUrl)
    //using gpt to perform some tasks on the markup returned from scrapping
    const data = await askGpt(markup)
    //using dall e to generate an image
    const generatedImageUrl = await generateImg(data?.imageDescription)
    //downloading the image from the url and uploading it to s3
    const s3ImageUrl = await generateImgUrl(generatedImageUrl, title, id)
    //uploading the article to strapi using post request
    const newTitle = data?.title
    const newMarkup = data?.content
    const description = data?.abstract
    const categories = data?.categories

    console.log('pushing article to strapi')
    await createPost(
      newTitle,
      description,
      newMarkup,
      s3ImageUrl,
      publishedAt,
      categories
    )
    console.log('article processing completed...')
};

此处没有更改代码；它刚刚被转移到一个函数中。

现在您的主函数可以执行上述函数而无需等待。相反，它可以捕获它返回的承诺（将处于待处理状态），并将此类承诺收集在数组中。这意味着现在将对不同的文章提出多个请求，而无需等待他们的回复。最后，您可能希望等待所有这些承诺都已解决。

这是你原来的函数的样子：

const fetchAndProcessNews = async (queryString, from) => {
  const query = {
    queryString,
    from,
    size: 1,
  }
  try {
    console.log('Fetching news...')
    const { articles } = await searchApi.getNews(query)
    console.log('total articles fetched:', articles.length)
    console.log('Fetched news:', articles)
    if (articles && articles.length > 0) {
      console.log('Processing news...')
      // looping through all the articles fetched from api
      const promises = [];
      for (const article of articles) {
        promises.push(processArticle(article)); // We don't await!
        await delay(100); // Determine which delay is suitable
      }
      // All articles are now being processed; wait for all to finish (optional)
      await promise.allSettled(promises);
    } else {
      console.log('No articles found')
    }
  } catch (error) {
    console.error('Error fetching news:', error.message)
  }
}

await Promise.allSettled

操作是可选的，但它对于

fetchAndProcessNews

的调用者很有用，因为只有在所有操作完成后，他们得到的承诺才会得到解决。

对大型数组中的每个项目执行多次计时操作的最佳方法

问题描述投票：0回答：1

1个回答

最新问题

对大型数组中的每个项目执行多次计时操作的最佳方法

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1