在一个请求中读取整个 S3 存储桶

Question

我有一个包含 100,000 张图像（png\jpg）的存储桶，我想在一个 s3 请求中读取它们。这是我的代码：

# geting connection to s3
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)

# geting list of all objectsummery in the relevant dir - this is a request to the api
crops_path_list = bucket.objects.filter(Prefix=prefix)

# iterating on the list and geting objects
for index, crop in enumerate(crops_path_list):
    # getting the bytes from s3 - this a request to the api
    crop = crop.get()

    # working with the bytes to make them image
    image_content = crop['Body'].read()
    bytes_images = BytesIO(image_content)
    image = Image.open(bytes_images)
    image = image.convert("RGB")
    image = np.asarray(image)
    image = np.ascontiguousarray(image.transpose(2, 0, 1))
    image  = torch.from_numpy(np.array(image)).unsqueeze(0).to(dtype=torch.float32, device=device)

    # adding to list of all images
    images.append(image)

需要很长时间，因为每个

.get()

都是对 api 的请求。找不到在一个请求中获取整个

crops_path_list

列表的任何用处。其他然后运行线程\子进程\ziping，还有其他想法如何使用更少的IO和api请求吗？

谢谢:)

Answer 1

这是不可能的。下载每个文件都是一个请求。

在一个请求中读取整个 S3 存储桶

问题描述投票：0回答：1

1个回答

最新问题

在一个请求中读取整个 S3 存储桶

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1