我想下载 s3 文件夹中存在的所有 csv 文件(2021-02-15)。我尝试了以下操作,但失败了。我该怎么办?
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))
for obj in objs:
client = boto3.client('s3')
client.download_file(bucket, obj, obj)
valueError: Filename must be a string
Marcin 的答案是正确的,但不同路径中具有相同名称的文件将被覆盖。 您可以通过在本地复制 S3 存储桶的文件夹结构来避免这种情况。
import boto3
import os
from pathlib import Path
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))
for obj in objs:
# print(obj.key)
# remove the file name from the object key
obj_path = os.path.dirname(obj.key)
# create nested directory structure
Path(obj_path).mkdir(parents=True, exist_ok=True)
# save file with full path locally
bucket.download_file(obj.key, obj.key)
既然您正在使用
resource
,您就可以使用download_file:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
key = 'product/myproject/2021-02-15/'
objs = list(bucket.objects.filter(Prefix=key))
for obj in objs:
#print(obj.key)
out_name = obj.key.split('/')[-1]
bucket.download_file(obj.key, out_name)
cloudpathlib
,对于 S3,它包装 boto3
。对于您的用例,这非常简单:
from cloudpathlib import CloudPath
cp = CloudPath("s3://bucket/product/myproject/2021-02-15/")
cp.download_to("local_folder")
Filter 返回一个集合对象,而不仅仅是名称,而
download_file()
方法需要对象名称:
试试这个:
objs = list(bucket.objects.filter(Prefix=key))
client = boto3.client('s3')
for obj in objs:
client.download_file(bucket, obj.name, obj.name)
您还可以使用
print(obj)
打印循环中的 obj
对象,以查看它实际拥有的内容。
根据 Marcello 的答案,我发现我必须执行检查以查看我下载的项目是来自 s3 的
file
还是 "directory"
。我稍微调整了方法,将 s3 中的项目也下载到本地指定的文件夹中。
download_directory = './download_directory' # replace with your root directory
bucket = s3.Bucket('bucket') # replace with your bucket
objs = list(bucket.objects.filter(Prefix='folder')) # replace with your s3 'folder'
for obj in objs:
if obj.key.endswith('/'):
continue # if the obj is a folder / directory then skip it
obj_path = os.path.dirname(obj.key)
local_file_path = os.path.join(download_directory, obj.key)
Path(os.path.dirname(local_file_path)).mkdir(parents=True, exist_ok=True)
bucket.download_file(obj.key, local_file_path)