在 AWS EC2 上下载图像但不在本地时出现 ReadTimeout 错误

Question

我有一个 Python 脚本，用于从 URL 下载图像并将其上传到 AWS S3。当我在本地计算机上运行该脚本时，它可以完美运行。但是，当我在 AWS EC2 实例上部署并运行相同的脚本时，遇到了

ReadTimeout

错误。

我收到的错误如下：

requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.net-a-porter.com', port=443): Read timed out. (read timeout=100)

下面是我的代码的相关部分：

import requests
import tempfile
import os

def upload_image_to_s3_from_url(self, image_url, filename, download_timeout=120):
    """
    Downloads an image from the given URL to a temporary file and uploads it to AWS S3,
    then returns the S3 file URL.
    """
    try:
        headers = {
            "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",
            'Accept': 'image/avif,image/webp,image/apng,image/*,*/*;q=0.8'
        }
        # Request the image
        response = requests.get(image_url, timeout=download_timeout, stream=True, headers=headers)
        response.raise_for_status()
        
        # Determine the content type
        content_type = response.headers.get('Content-Type', 'image/jpeg')  # Default to image/jpeg

        # Create a temporary file
        with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
            # Write the response content to the temporary file
            for chunk in response.iter_content(chunk_size=8192):
                tmp_file.write(chunk)
            
            # Now that we have the image locally, upload it to S3 with the correct content type
            file_url = self.upload_image_to_s3(tmp_file.name, filename, content_type)

        # Optionally, delete the temporary file here if you set delete=False
        os.unlink(tmp_file.name)

        return file_url
    except requests.RequestException as e:
        raise Exception(f"Failed to download or upload image. Error: {e}")

# Example URL causing issues
image_url = "https://www.net-a-porter.com/variants/images/1647597326276381/in/w1365_a3-4_q60.jpg"

尝试从

www.net-a-porter.com

下载图像时会出现此问题。超时设置为 120 秒，我认为这已经足够了。

到目前为止我尝试过的：

增加超时时间
更改请求标头中的
```
User-Agent
```
在一天的不同时间运行脚本以排除服务器负载问题

任何有关如何解决此问题的见解或建议将不胜感激。

Answer 1

测试表明，当添加一组特定标头时，Web 服务器会做出响应。不确定这种行为是有意还是无意。更改了用户代理并添加了额外的标头，如下所示以查看它是否获得响应：

headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0",
            'Accept': 'image/avif,image/webp,image/apng,image/*,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate, br',
            'Connection': 'keep-alive'
        }

你能尝试一下吗？

在 AWS EC2 上下载图像但不在本地时出现 ReadTimeout 错误

问题描述投票：0回答：1

1个回答

最新问题

在 AWS EC2 上下载图像但不在本地时出现 ReadTimeout 错误

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1