如何使用AWS胶水与s3进行存储?

问题描述 投票:1回答:1

我尝试在S3上使用AWS Glue进行分区和存储。但是桶装没用。仅分区起作用。如何使用AWS Glue进行存储?

datasink4 = glueContext.write_dynamic_frame.from_options(
    frame = dropnullfields3,
    connection_type = "s3",
    connection_options = {"path": s3_output_full,
                          "partitionKeys": ["PARTITIONKEY"],
                          "bucketColumns": ["ROW_ID"],
                          "numberOfBuckets": 12},
    format = "parquet",
    transformation_ctx = "datasink4")

job.commit()
python amazon-web-services aws-glue bucket
1个回答
0
投票

我认为还不支持它们

我的脚本改为使用bucketBy函数;但它将替换已定义路径中的现有数据

df_name, job_df = (str(transform_name), df)
datasink_path = "s3://sink-bucket/job-data/"
writing = job_df.write.format('parquet').mode("append") \
                          .partitionBy('event_day') \
                          .bucketBy(3, 'bucketed_field') \
                          .saveAsTable(df_name, path = datasink_path)
© www.soinside.com 2019 - 2024. All rights reserved.