列出大量 s3 存储桶对象到文件中

问题描述 投票:0回答:2

我在 s3 存储桶(10m)中有大量文件,我想将这些文件写入文本文件以进行进一步处理。

问题是我如何有效地做到这一点?是:

aws s3 ls s3://bucketname > out.txt

唯一的选择?不过我只需要文件网址。如何才能实现这一目标?

amazon-web-services amazon-s3
2个回答
3
投票

您尝试过S3库存报告吗?不过,您仍然需要一些后期处理才能获得所需的格式。


0
投票

编写一个 python shell 作业并从 S3 获取文件夹和文件名列表。 以下 python 作业获取文件夹及其文件名,然后将 CSV 文件写入另一个 S3 位置。

import sys
import os
from awsglue.utils import getResolvedOptions
import boto3

# Specify the S3 bucket and path
s3_bucket = "your_source_bucket"
specific_file = ""  #only needed in case you want for a specific folder or file

# Create an S3 client
s3_client = boto3.client('s3')

# List objects in the bucket with pagination
paginator = s3_client.get_paginator('list_objects')
response_iterator = paginator.paginate(Bucket=s3_bucket,Prefix=specific_file)

# Extract folder names and file names
folder_names = set()
file_names = []

for response in response_iterator:
    for content in response.get('Contents', []):
        key = content['Key']
        if '/' in key:
            folder_name, file_name = os.path.split(key)
            folder_names.add(folder_name)
            file_names.append((folder_name, file_name))
        else:
            file_names.append(('', key))


# Save the list of file names to a CSV file
csv_output_path = "/tmp/s3_files.csv"  # Use a local temporary file
with open(csv_output_path, 'w') as file:
    file.write('Folder,File\n')
    for folder_name, file_name in file_names:
        file.write('{},{}\n'.format(folder_name, file_name))

# Upload the CSV file to S3
s3_client.upload_file(csv_output_path, "your_bucket", "your_prefix_and_file_name")

# Clean up the temporary file
os.remove(csv_output_path)
© www.soinside.com 2019 - 2024. All rights reserved.