Cherrio JS 返回父 div 的所有图像 SRC

问题描述 投票:0回答:1

考虑以下 HTML:

<div aria-roledescription="carousel" data-disliderguid="slider772" class="di-slider slider772-slider gmus-1800x760-slider">
<div class="swiper-container">
<div class="swiper-wrapper">
<div
   class="di-slide swiper-slide"
   data-guid="slide2221"
   data-screen="desktop"
   data-title="995_2024_All_Hummer_Evergreen_2024_DWC"
   data-id="2221"
   data-filtervalue=""
   data-swiper-autoplay="3000">
   
   <div class="di-slider-disclaimer">
      <button class="di-slider-disclaimer-toggle" aria-expanded="false">
      <span class="inactive-label">Important Information</span>
      <span class="active-label">Hide Information</span>
      </button>
      <div class="di-slider-disclaimer-container">
         <div class="di-slider-disclaimer-contents">
            Preproduction and simulated models shown throughout. Actual production model may vary. HUMMER EV is available from a GMC EV dealer.                                
         </div>
      </div>
   </div>
   <a class="di-slider-link"
      aria-hidden="true"
      href="/new-vehicles/?_dFR%5Byear%5D%5B0%5D=2024&_dFR%5Bmake%5D%5B0%5D=GMC&_dFR%5Bmodel%5D%5B0%5D=HUMMER+EV&_dFR%5Bmodel%5D%5B1%5D=HUMMER+EV+SUV&_dFR%5Bmodel%5D%5B2%5D=HUMMER+EV+Pickup"
      title=""
      tabindex="-1"
      >
      <picture class="slide-image">
         <source media="(max-width: 767px)"                                     srcset="https://gtmassets.dealerinspire.com/9061-995_2024_All_Hummer_Evergreen_2024_DWC_600x400.jpg">
         <source media="(min-width: 768px)"
            srcset="https://gtmassets.dealerinspire.com/9061-995_2024_All_Hummer_Evergreen_2024_DWC_1800x760.jpg">
         <img src="https://gtmassets.dealerinspire.com/9061-995_2024_All_Hummer_Evergreen_2024_DWC_1800x760.jpg"                                      alt="GMC HUMMER EV PICKUP AND SUV"
            style=""
            width="1800" height="760">
      </picture>
   </a>
</div>
<div
   class="di-slide swiper-slide"
   data-guid="slide950"
   data-screen="desktop"
   data-title="Generic"
   data-id="950"
   data-filtervalue=""
   >
<picture class="slide-image">
   <source media="(max-width: 767px)"                                     srcset="https://di-uploads-development.dealerinspire.com/robertsonsgmc-winback0123/uploads/2023/03/Group-of-2023-GMC-Terrain-SUVs-parked-on-beach_mobile.jpg">
   <source media="(min-width: 768px)"
      srcset="https://di-uploads-development.dealerinspire.com/robertsonsgmc-winback0123/uploads/2023/03/Group-of-2023-GMC-Terrain-SUVs-parked-on-beach-1800x760.jpg">
   <img src="https://di-uploads-development.dealerinspire.com/robertsonsgmc-winback0123/uploads/2023/03/Group-of-2023-GMC-Terrain-SUVs-parked-on-beach-1800x760.jpg"                                      alt="Group of 2023 GMC Terrain SUVs parked on beach"
      style="visibility:hidden"
      width="1800" height="760">
</picture>

我尝试通过 ScrapeNinja 使用 Cherrio 返回 Div 类 di-slider 子级的所有图像的 SRC,如 HTML 片段的第一行所示。所有图像都是 HTML 图片对象,并且都具有类似的 div 类。但是,我想要返回的唯一链接是值。

当我尝试在他们的沙箱上运行以下代码时:https://scrapeninja.net/cheerio-sandbox/basic,我收到错误“错误:预期名称,找到://gtmassets.dealerinspire.com/9061 -995_2024_All_Hummer_Evergreen_2024_DWC_1800x760.jpg 第 19 行

这是我收到的错误:

// define function which accepts body and cheerio as args
function extract(input, cheerio) {
    // return object with extracted values              
    let $ = cheerio.load(input);
    var listItems = $(".di-slider");
    listItems.each(function(idx, picture) {
    let image= $(picture).find('img').attr('src'); 
    return {
        source: $(image)
    };
});
    
}

我承认,我对 JS 并不是最擅长的,我已经很多年没有使用 jQuery 了,这是我第一次尝试使用 Cheerio 或 scrapeninja。

我已经查看了文档https://pixeljets.com/blog/cheerio-sandbox-cheatsheet/#iterate-over-children-and-return-them-as-an-array-of-objects,并且我构建了我的功能是如何通过cheerio获取图像url?

javascript web-scraping cheerio
1个回答
0
投票

几个问题:

  1. .forEach
    /
    .each
    不返回值。您从其中返回的任何内容都将被忽略。另一方面,
    .map
    使用回调返回的所有值分配一个数组。这是最适合这项工作的功能。您还可以将每个项目推送到数组变量上,但这就是
    map
    的抽象设计目的。
  2. 您不会从
    extract()
    退回任何东西。
  3. 崩溃的主要原因是您将字符串放入 Cheerio 对象中:
    $("https://gtmassets.dealerinspire.com/9061-995_2024_All_Hummer_Evergreen_2024_DWC_1800x760.jpg")
    。删除这里的
    $()

工作代码:

function extract(input, cheerio) {
  const $ = cheerio.load(input);
  return [...$(".di-slider")].map(e => ({
    source: $(e).find("img").attr("src")
  }));
}
© www.soinside.com 2019 - 2024. All rights reserved.