如何删除 s3 存储桶文件夹中的内容

问题描述 投票:0回答:1

我正在运行 Lambda 函数来查询 ALB 的访问日志并将输出发送到 S3 存储桶。正在执行两个查询:

Daily Logs
Monthly Logs

我创建了一个bucket和两个文件夹,分别是DailyLogs和MonthlyLogs,这样当Lambda函数执行时,日志就会存储在各自的文件夹中。

我还添加了一个功能,可以删除日志中以前的 CSV 文件,并在生成时用新日志替换它。但是,当执行 Lambda 函数时,整个文件夹 DailyLogs 和 MonthlyLogs 将被删除。

我只想删除文件夹内的内容并用新日志替换。

你能帮我吗?

附件是Lambda代码。

import boto3
import json
import time
from datetime import datetime

# Query string to execute
daily_query = "SELECT * FROM \"DATABASE\".\"TABLE\" WHERE user_agent LIKE '%test%' AND date_parse(time, '%Y-%m-%dT%H:%i:%s.%fZ') >= date_parse(date_format(date_add('day', -1, current_date), '%Y-%m-%d'), '%Y-%m-%d') AND date_parse(time, '%Y-%m-%dT%H:%i:%s.%fZ') < date_parse(date_format(current_date, '%Y-%m-%d'), '%Y-%m-%d') ORDER BY time ASC"
monthly_query = "SELECT * FROM \"DATABASE\".\"TABLE\" WHERE parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') BETWEEN parse_datetime(CAST(date_trunc('month', current_date) AS varchar), 'yyyy-MM-dd') AND parse_datetime(CAST(current_date AS varchar), 'yyyy-MM-dd') AND user_agent LIKE '%test%' ORDER BY time ASC"

# Database to execute the query against
DATABASE = 'DATABASE'

# Output bucket
bucket_name = 'BUCKET_NAME'

# Initialize Boto3 clients
s3_client = boto3.client('s3')
athena_client = boto3.client('athena')

def lambda_handler(event, context):
    try:
        # Get current date
        current_date = datetime.now()

        # Create folder names for daily and monthly logs
        daily_folder = f"DailyLogs/{current_date.strftime('%Y-%m-%d')}/"
        monthly_folder = f"MonthlyLogs/{current_date.strftime('%Y-%m')}/"

        # Delete existing files in the S3 bucket
        delete_daily_files(daily_folder)
        delete_monthly_files(monthly_folder)

        # Start the query executions
        response = athena_client.start_query_execution(
            QueryString=daily_query,
            QueryExecutionContext={'Database': DATABASE},
            ResultConfiguration={'OutputLocation': f's3://{bucket_name}/{daily_folder}'}
        )

        response = athena_client.start_query_execution(
            QueryString=monthly_query,
            QueryExecutionContext={'Database': DATABASE},
            ResultConfiguration={'OutputLocation': f's3://{bucket_name}/{monthly_folder}'}
        )

        return response
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        return {'statusCode': 500, 'body': json.dumps({'error': str(e)})}

def delete_daily_files(folder):
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder)
    if 'Contents' in response:
        keys_to_delete = [{'Key': obj['Key']} for obj in response['Contents']]
        if keys_to_delete:
            s3_client.delete_objects(Bucket=bucket_name, Delete={'Objects': keys_to_delete})

def delete_monthly_files(folder):
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder)
    if 'Contents' in response:
        keys_to_delete = [{'Key': obj['Key']} for obj in response['Contents']]
        if keys_to_delete:
            s3_client.delete_objects(Bucket=bucket_name, Delete={'Objects': keys_to_delete})

找到随附的代码。不是删除文件,而是删除整个文件夹。

python amazon-web-services amazon-s3 aws-lambda
1个回答
0
投票

您可以通过查找斜杠来检查该对象是否是文件夹。

if not obj['Key'].endswith('/')

这是一个可以解决问题的函数;

def delete_files_in_folder(folder):
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder)
    if "Contents" in response:
        keys_to_delete = [
            {"Key": obj["Key"]}
            for obj in response["Contents"]
            if not obj["Key"].endswith("/")
        ]
        if keys_to_delete:
            s3_client.delete_objects(
                Bucket=bucket_name, Delete={"Objects": keys_to_delete}
            )
© www.soinside.com 2019 - 2024. All rights reserved.