我正在尝试使用 Python 和 Beautiful Soup 从网站上抓取一些数据,特别是 base64 格式的图像。然而,当我运行我的代码时,图像数据以一种奇怪的格式出现,如下所示:
"image": "data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7",
这里是相关的代码片段:
def search_mercadolivre_by_category(category):
url = f"https://lista.mercadolivre.com.br/{category}"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
products = soup.find_all("li", {"class": "ui-search-layout__item"})
results = []
for product in products:
title = product.find("h2", {"class": "ui-search-item__title"}).text.strip()
price = product.find("span", {"class": "price-tag-fraction"}).text.strip()
link = product.find("a", {"class": "ui-search-link"})['href']
image = product.find("img")['src']
results.append({
"title": title,
"price": price,
"link": link,
"image": image,
"category": category,
"website": "Mercado Livre",
"keyword": ""
})
return results
谁能帮我正确解码图像数据?
我期待在这里找到这个来源。
<img width="160" height="160" decoding="async" src="https://http2.mlstatic.com/D_NQ_NP_609104-MLA50695427900_072022-V.webp" class="ui-search-result-image__element shops__image-element" alt="Samsung Galaxy M13 Dual SIM 128 GB verde 4 GB RAM">
那是一个DataURI。您可以像这样最简单地阅读它:
from urllib import request
with request.urlopen('data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7') as DataURI:
im = DataURI.read()
如果你看前几个字节,你可以看到它确实是一个 1x1 GIF 图片:
print(im[:10]) # prints b'GIF89a\x01\x00\x01\x00'
如果你想把它保存到磁盘为
image.gif
,你可以使用:
from pathlib import Path
Path('image.gif').write_bytes(im)
如果想在PIL中打开,可以用
BytesIO
包起来,这样打开:
from PIL import Image
from io import BytesIO
# Open as PIL Image
PILImage = Image.open(BytesIO(im))
PILImage.show() # display in viewer
PILImage.save('result.png') # save to disk as PNG
我觉得你需要:
image = product.find("img")['data-src']