在 Python 的 Beautiful Soup 中解码 base64 图像数据的问题

问题描述 投票:0回答:2

我正在尝试使用 Python 和 Beautiful Soup 从网站上抓取一些数据,特别是 base64 格式的图像。然而,当我运行我的代码时,图像数据以一种奇怪的格式出现,如下所示:

"image": "data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7",

这里是相关的代码片段:

def search_mercadolivre_by_category(category):
    url = f"https://lista.mercadolivre.com.br/{category}"
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    products = soup.find_all("li", {"class": "ui-search-layout__item"})
    results = []
    for product in products:
        title = product.find("h2", {"class": "ui-search-item__title"}).text.strip()
        price = product.find("span", {"class": "price-tag-fraction"}).text.strip()
        link = product.find("a", {"class": "ui-search-link"})['href']
        image = product.find("img")['src']
        results.append({
            "title": title,
            "price": price,
            "link": link,
            "image": image,
            "category": category,
            "website": "Mercado Livre",
            "keyword": ""
        })
    return results

谁能帮我正确解码图像数据?

我期待在这里找到这个来源。

<img width="160" height="160" decoding="async" src="https://http2.mlstatic.com/D_NQ_NP_609104-MLA50695427900_072022-V.webp" class="ui-search-result-image__element shops__image-element" alt="Samsung Galaxy M13 Dual SIM 128 GB verde 4 GB RAM">
python web-scraping base64 decode
2个回答
0
投票

那是一个DataURI。您可以像这样最简单地阅读它:

from urllib import request

with request.urlopen('data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7') as DataURI:
   im = DataURI.read()

如果你看前几个字节,你可以看到它确实是一个 1x1 GIF 图片:

print(im[:10])       # prints b'GIF89a\x01\x00\x01\x00'

如果你想把它保存到磁盘为

image.gif
,你可以使用:

from pathlib import Path
Path('image.gif').write_bytes(im)

如果想在PIL中打开,可以用

BytesIO
包起来,这样打开:

from PIL import Image
from io import BytesIO

# Open as PIL Image
PILImage = Image.open(BytesIO(im))

PILImage.show()               # display in viewer
PILImage.save('result.png')   # save to disk as PNG

0
投票

我觉得你需要:

image = product.find("img")['data-src']
© www.soinside.com 2019 - 2024. All rights reserved.