如何使用 boto3 下载该文件夹中的所有内容

Question

我想下载 s3 文件夹中存在的所有 csv 文件（2021-02-15）。我尝试了以下操作，但失败了。我该怎么办？

import boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))
for obj in objs:
    client = boto3.client('s3')
    client.download_file(bucket, obj, obj)

valueError: Filename must be a string

Answer 1

Marcin 的答案是正确的，但不同路径中具有相同名称的文件将被覆盖。您可以通过在本地复制 S3 存储桶的文件夹结构来避免这种情况。

import boto3
import os
from pathlib import Path

s3 = boto3.resource('s3')

bucket = s3.Bucket('bucket')

key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))

for obj in objs:
    # print(obj.key)

    # remove the file name from the object key
    obj_path = os.path.dirname(obj.key)

    # create nested directory structure
    Path(obj_path).mkdir(parents=True, exist_ok=True)

    # save file with full path locally
    bucket.download_file(obj.key, obj.key)

Answer 2

既然您正在使用

resource

，您就可以使用download_file:

import boto3

s3 = boto3.resource('s3')

bucket = s3.Bucket('bucket')

key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))

for obj in objs:
    #print(obj.key)
    out_name = obj.key.split('/')[-1]
    bucket.download_file(obj.key, out_name)

Answer 3

您还可以使用

cloudpathlib

，对于 S3，它包装

boto3

。对于您的用例，这非常简单：

from cloudpathlib import CloudPath

cp = CloudPath("s3://bucket/product/myproject/2021-02-15/")
cp.download_to("local_folder")

Answer 4

Filter 返回一个集合对象，而不仅仅是名称，而

download_file()

方法需要对象名称：

试试这个：

objs = list(bucket.objects.filter(Prefix=key))
client = boto3.client('s3')
for obj in objs:
    client.download_file(bucket, obj.name, obj.name)

您还可以使用

print(obj)

打印循环中的

obj

对象，以查看它实际拥有的内容。

Answer 5

根据 Marcello 的答案，我发现我必须执行检查以查看我下载的项目是来自 s3 的

file

还是

"directory"

。我稍微调整了方法，将 s3 中的项目也下载到本地指定的文件夹中。

    download_directory = './download_directory'  # replace with your root directory
    bucket = s3.Bucket('bucket') # replace with your bucket
    objs = list(bucket.objects.filter(Prefix='folder')) # replace with your s3 'folder'

    for obj in objs:
        if obj.key.endswith('/'):
            continue # if the obj is a folder / directory then skip it

        obj_path = os.path.dirname(obj.key)
        local_file_path = os.path.join(download_directory, obj.key)

        Path(os.path.dirname(local_file_path)).mkdir(parents=True, exist_ok=True)
        bucket.download_file(obj.key, local_file_path)

如何使用 boto3 下载该文件夹中的所有内容

问题描述投票：0回答：5

5个回答

最新问题

如何使用 boto3 下载该文件夹中的所有内容

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5