如何从BeautifulSoup中的li标签之间的span标签中获取文本?

问题描述 投票:1回答:1

我正在尝试使用BeautifulSoup从网站上获取产品尺寸,但却被困在这里。我只需要获得文本:

S, M, L, XL, XXL, XXXL, 4XL, 5XL

代码:

import bs4

from urllib.request import urlopen as uReq

from bs4 import BeautifulSoup as soup

myurl = 'https://www.aliexpress.com/item/Vfemage-Womens-Elegant-Ruched-Bow-Contrast-Patchwork-3-4-Sleeve-Vintage-Pinup-Work-Office-Party-Fitted/32831085887.html?spm=2114.search0103.3.12.iQlXqu&ws_ab_test=searchweb0_0,searchweb201602_3_10152_10065_10151_10344_10068_10345_10342_10325_10343_51102_10546_10340_10548_10341_10609_10541_10084_10083_10307_10610_10539_10312_10313_10059_10314_10534_100031_10604_10603_10103_10605_10594_10142_10107,searchweb201603_25,ppcSwitch_5&algo_expid=a3e03a67-d922-4c90-aba7-d3cc80101a75-1&algo_pvid=a3e03a67-d922-4c90-aba7-d3cc80101a75&rmStoreLevelAB=0'

uClient = uReq(myurl)

page_html = uClient.read()

uClient.close()

page_soup = soup(page_html, "html.parser")

size = page_soup.findAll("ul",{"id":"j-sku-list-2"})
print(size)

它返回:

[
<ul class="sku-attr-list util-clearfix" data-sku-prop-id="5" data-sku-show-type="none" id="j-sku-list-2">
  <li><a data-role="sku" data-sku-id="100014064" href="javascript:void(0)" id="sku-2-100014064"><span>S</span></a></li>
  <li><a data-role="sku" data-sku-id="361386" href="javascript:void(0)" id="sku-2-361386"><span>M</span></a></li>
  <li><a data-role="sku" data-sku-id="361385" href="javascript:void(0)" id="sku-2-361385"><span>L</span></a></li>
  <li><a data-role="sku" data-sku-id="100014065" href="javascript:void(0)" id="sku-2-100014065"><span>XL</span></a></li>
  <li><a data-role="sku" data-sku-id="4182" href="javascript:void(0)" id="sku-2-4182"><span>XXL</span></a></li>
  <li><a data-role="sku" data-sku-id="4183" href="javascript:void(0)" id="sku-2-4183"><span>XXXL</span></a></li>
  <li><a data-role="sku" data-sku-id="200000990" href="javascript:void(0)" id="sku-2-200000990"><span>4XL</span></a></li>
  <li><a data-role="sku" data-sku-id="200000991" href="javascript:void(0)" id="sku-2-200000991"><span>5XL</span></a></li>
</ul>]
python web-scraping beautifulsoup html-parsing
1个回答
0
投票

你需要进一步深入查看ul寻找li元素,每个元素调用get_text()

sizes = page_soup.find("ul", {"id":"j-sku-list-2"}).find_all("li")
print([size.get_text(strip=True) for size in sizes])
# prints ['S', 'M', 'L', 'XL', 'XXL', 'XXXL', '4XL', '5XL']

或者,用CSS selector更简洁的方式:

sizes = page_soup.select("ul#j-sku-list-2 li")
© www.soinside.com 2019 - 2024. All rights reserved.