使用beautifulsoup提取img URL

问题描述 投票:0回答:1

我想从Amazon抓取图像URL,但我不明白。那就是我的代码:

from bs4 import BeautifulSoup
import requests
from csv import writer

URL = "https://www.amazon.de/gp/product/B01NCX7I5P/ref=ox_sc_act_title_4?smid=A30K5QJOZDKVAA&psc=1"

response = requests.get(URL, headers={"User-Agent":"Mozilla/5.0"})

soup = BeautifulSoup(response.text, "html.parser")

img = soup.find(class_="imgTagWrapper")

print(img)

这是我的输出:

<div class="imgTagWrapper" id="imgTagWrapperId">
<img alt="TURATA Digitaler Messschieber IPX54 Wasserdichte Schieblehre Edelstahl 150 mm / 6 Zoll f&amp;uuml;r Abst&amp;auml;nden, Durchmesser, Tiefenma&amp;szlig;, mit LCD Display Profimessger&amp;auml;t" class="a-dynamic-image a-stretch-horizontal" data-a-dynamic-image='{"https://images-na.ssl-images-amazon.com/images/I/61JepX9ctWL._SX342_.jpg":[342,342],"https://images-na.ssl-images-amazon.com/images/I/61JepX9ctWL._SX466_.jpg":[466,466],"https://images-na.ssl-images-amazon.com/images/I/61JepX9ctWL._SX425_.jpg":[425,425],"https://images-na.ssl-images-amazon.com/images/I/61JepX9ctWL._SX522_.jpg":[522,522],"https://images-na.ssl-images-amazon.com/images/I/61JepX9ctWL._SX385_.jpg":[385,385]}' data-a-image-name="landingImage" data-old-hires="https://images-na.ssl-images-amazon.com/images/I/61JepX9ctWL._SL1500_.jpg" id="landingImage" onload="if(this.width/this.height &gt; 1.0){this.className += ' a-stretch-horizontal'}else{this.className += ' a-stretch-vertical'};this.onload='';" src="
 Thread 'MainThread' (0x1) hat mit Code 0 (0x0) geendet.
FyXUsTieLZP6koIBh2ImHM+lqXuv7LSfygrHkp35CGxztmbrkLTf4aXUr8Pq73NFP8A6XD+xXAYjEAWfvKLLoBfMAPDVqDAyFtusZUNf7XZvf5K1wjHfMzeV2f+S2Ir8UZ3p6sfbjv8yVf+9Kx3ZkfA++nrIf7WoMOKpyADOxw4ZgQ78pC2GHAV2L0FMeqMclTEJAToW5xcFY7qp78oLKDs8GjLf7Sjpnz0tWyspRG2eN/WRWcxzWu+yc2yDd4tJEzFZyaChhZBO6CLDxCAXgG2rmBpJd4Luh0Z6Nz5b0L4ZCO0wSP0P+4uGPSLHZJ2VM0NDJUx6RzOhGcDzaQtlF0sx4D1lNQO5kh7T+ddbnO99OXfH61uO0VDhtfX0tEHCOOFgykk6uDXnder9BNeitJ5yfnK8fxGeetdWV0zAx0zQHAG4BAawAE66gL13oD/ANK0vnL+crlrpjo0RFlRERAREQEREBERAREQUc1rmlrgC06EFa+fDnaupteJiP8AKT8itiiDl3iRt22yybOa5X4fUyUUhtd8b9ZQfzBdHUU8M7bSN19l40cFgjB3EnNNYcC0akeOtkGex7JGB7CC12oIVVBTUraPM2KWZ4dq4PIyg82gAWU6AiIgIiICIiAiIgIiICIiCj4ad+skUT+Zc0H5hH4ZhcmrqWn+DQPkqogxn4Fg796do+yXD5OWM/o3hDtmzs8nlbJEGlf0WoD3J6tvmWn+QLFf0TZ7FXID4sXSIqORd0UrNmVkLvBzCPk4rEl6K4iCLvoHsJAeTe4HxjXcog5CboTTu7k8f+uEfo5a2XoG/wBh9CfMPavQUQeN4j0XNFK2KemlcS3MJIMz4vyaFag4RTR8Z4j8Qve1QhrtwD5oPARhDZyIm1ctnEdg3dr5X1XuWD0MOEYTT0FPn7DLyOdvnd23LMEFOH52wwh/vhoB++11VAREUBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQEREBERAREQf/2Q==

" style="max-width:522px;max-height:522px;"/>
</div>

我需要URL“ https://images-na.ssl-images-amazon.com/images/I/61JepX9ctWL.SX342.jpg”,它应该是单独的,以便我可以更改产品URL并仍然获得与Amazon相同的第一张图片。我还观看了此线程:How to extract img src from web page via lxml in beautifulsoup using python?,但不是单个线程。

我希望任何人都能帮助我,并且抓取的部分看起来与我在Google Chrome浏览器中观看的html代码有些不同

python beautifulsoup python-requests-html
1个回答
0
投票

这是因为他们将图像编码为base64,一旦打开页面,该图像就会在浏览器中呈现。

您可以从另一个属性获取图像网址,如下所示:

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"}

url = "https://www.amazon.de/gp/product/B01NCX7I5P/ref=ox_sc_act_title_4?smid=A30K5QJOZDKVAA&psc=1"

response = requests.get(url, headers=headers).text
soup = BeautifulSoup(response, 'html.parser')
imgdata = soup.select("#imgTagWrapperId img")
img_url = imgdata[0].attrs['data-old-hires']
print(img_url)

输出:

https://images-na.ssl-images-amazon.com/images/I/61JepX9ctWL._SL1500_.jpg
© www.soinside.com 2019 - 2024. All rights reserved.