在 Python 的 Beautiful Soup 中解码 base64 图像数据的问题

Question

我正在尝试使用 Python 和 Beautiful Soup 从网站上抓取一些数据，特别是 base64 格式的图像。然而，当我运行我的代码时，图像数据以一种奇怪的格式出现，如下所示：

"image": "data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7",

这里是相关的代码片段：

def search_mercadolivre_by_category(category):
    url = f"https://lista.mercadolivre.com.br/{category}"
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    products = soup.find_all("li", {"class": "ui-search-layout__item"})
    results = []
    for product in products:
        title = product.find("h2", {"class": "ui-search-item__title"}).text.strip()
        price = product.find("span", {"class": "price-tag-fraction"}).text.strip()
        link = product.find("a", {"class": "ui-search-link"})['href']
        image = product.find("img")['src']
        results.append({
            "title": title,
            "price": price,
            "link": link,
            "image": image,
            "category": category,
            "website": "Mercado Livre",
            "keyword": ""
        })
    return results

谁能帮我正确解码图像数据？

我期待在这里找到这个来源。

<img width="160" height="160" decoding="async" src="https://http2.mlstatic.com/D_NQ_NP_609104-MLA50695427900_072022-V.webp" class="ui-search-result-image__element shops__image-element" alt="Samsung Galaxy M13 Dual SIM 128 GB verde 4 GB RAM">

Answer 1

那是一个DataURI。您可以像这样最简单地阅读它：

from urllib import request

with request.urlopen('data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7') as DataURI:
   im = DataURI.read()

如果你看前几个字节，你可以看到它确实是一个 1x1 GIF 图片：

print(im[:10])       # prints b'GIF89a\x01\x00\x01\x00'

如果你想把它保存到磁盘为

image.gif

，你可以使用：

from pathlib import Path
Path('image.gif').write_bytes(im)

如果想在PIL中打开，可以用

BytesIO

包起来，这样打开：

from PIL import Image
from io import BytesIO

# Open as PIL Image
PILImage = Image.open(BytesIO(im))

PILImage.show()               # display in viewer
PILImage.save('result.png')   # save to disk as PNG

Answer 2

我觉得你需要：

image = product.find("img")['data-src']

在 Python 的 Beautiful Soup 中解码 base64 图像数据的问题

问题描述投票：0回答：2

2个回答

最新问题

在 Python 的 Beautiful Soup 中解码 base64 图像数据的问题

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2