我在使用Beautifulsoup从页面中获取数据时遇到了一个问题。问题是当我应用我的代码时,它工作得很好,但只是返回33个产品的样本,而不是页面中的82个产品(所有82个产品都有相同类型的html组织)。下面是代码和html。
Python代码:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup
url = "https://www.zalando.es/chaquetas-hombre/"
#Opening up connection, grabbing and closing after
uClient = uReq(url)
pagehtml = uClient.read()
uClient.close()
page = BeautifulSoup(pagehtml, "html.parser")
resp = page.find_all("z-grid-item", {"class":"cat_card-1o_9G cat_normalWidth-tz8JR"})
print (len(resp))
tot_brand = list()
tot_products = list()
tot_prices = list()
tot_hiperlinks = list()
# Tomo los datos necesarios (Marca, precio, nombre, etc. etc.)
for i in resp:
try:
brand = i.find("div", {"class": "cat_brandName-2XZRz cat_ellipsis-MujnT"}).text
products = i.find("div", {"class": "cat_articleName--arFp cat_ellipsis-MujnT"}).text
prices = i.find("div",{"class": "cat_originalPrice-2Oy4G"}).text[0:-2]
prices = float(prices.replace(",","."))
hlink = i.find("a").get("href")
我不知道这是因为我没有注意到任何一点,还是因为可能这个页面不允许你得到所有你想要的数据。如果有人知道请让我知道为什么?
獶盽稰谅
你看到的数据是以Json形式出现在页面上的。你可以使用 json
模块来解析它。
例如
import json
import requests
from bs4 import BeautifulSoup
url = 'https://www.zalando.es/chaquetas-hombre/'
headers = {'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
json_data = json.loads(soup.select_one('#z-nvg-cognac-props').contents[0].replace('<![CDATA[', '').replace(']]>', ''))
# uncomment this to print all data:
# print(json.dumps(json_data, indent=4))
for no, article in enumerate(json_data['articles'], 1):
print('{:<4}{:<25}{:<75}{}'.format(no, article['brand_name'], article['name'], article['url_key']))
Prints:
1 Ellesse LOMBARDY - Chaqueta de entretiempo - dress blues ellesse-chaqueta-de-invierno-dress-blues-el922l007-k11
2 Napapijri SHELTER H - Chaqueta fina - blu marine napapijri-shelter-h-chaqueta-fina-blu-marine-na622t027-k11
3 SIKSILK DETACHABLE HOOD - Chaqueta vaquera - mid wash blue siksilk-detachable-hood-chaqueta-vaquera-mid-wash-blue-sif22t00j-k11
4 Jack & Jones JCOROCKY - Chaqueta de cuero sintético - black jack-and-jones-jcorocky-jacket-chaqueta-de-cuero-sintetico-black-ja222t0b6-q11
5 New Look SLEEVE - Chaqueta vaquera - light blue new-look-sleeve-chaqueta-vaquera-light-blue-nl022d07f-k11
6 Selected Homme CLASSIC JACKET - Chaqueta de cuero - black selected-homme-classic-jacket-chaqueta-de-cuero-black-se622t04e-q11
7 Jack & Jones JJIALVIN JJJACKET - Chaqueta vaquera - blue denim jack-and-jones-jjialvin-jjjacket-chaqueta-vaquera-blue-denim-ja222t0c8-k11
8 Nike Sportswear M NSW NIKE AIR JKT WVN - Cortaviento - white/black nike-sportswear-cortaviento-whiteblack-ni122t03x-a11
9 Only & Sons ONSAL - Chaqueta de cuero sintético - black only-and-sons-onsal-chaqueta-de-cuero-sintetico-black-os322t05c-q11
10 Ellesse MONTERINI PADDED - Cortaviento - black ellesse-monterini-cortaviento-black-el922t026-q11
11 Oakwood CASEY - Chaqueta de cuero - bordeaux oakwood-casey-chaqueta-de-cuero-oa122j016-g11
12 Brave Soul SANJAY - Chaquetas bomber - khaki brave-soul-sanjay-chaquetas-bomber-brh22t01c-n11
...
80 Replay Chaqueta vaquera - dark blue replay-chaqueta-vaquera-dark-blue-re322t01o-k11
81 Pier One Forro polar - dark green pier-one-forro-polar-dark-green-pi922s062-m11
82 CELIO RUALF - Chaquetas bomber - kaki celio-rualf-chaquetas-bomber-cf522t01h-n11
83 Levi's® THE TRUCKER JACKET - Chaqueta vaquera - embossed 2 horse trucker levisr-the-trucker-jacket-chaqueta-vaquera-embossed-2-horse-trucker-l1o22t002-k11
84 adidas Performance VARILITE SOFT HOODED - Chaqueta de plumas - carbon adidas-performance-varilite-soft-chaqueta-de-plumas-carbon-ad542f0fb-q11