Amazon Glue - 作业期间出现连接超时错误

问题描述 投票:0回答:1

我正在尝试创建从 Redshift Cluster 到 dynamoDB 的 Amazon Glue 作业。连接已建立,但出现以下错误:

调用o160.pyWriteDynamicFrame时发生错误。无法 执行 HTTP 请求:连接到 dynamodb.us-east-1.amazonaws.com:443 [dynamodb.us-east-1.amazonaws.com/52.119.535.345] 失败:连接 超时

胶水连接没有问题,爬虫可以工作。但我不知道为什么会出现这个错误。 Redshift集群的可用区是us-east-1b,所以我将子集设置为对应的子集。

我已点击此链接:https://aws.amazon.com/premiumsupport/knowledge-center/connection-timeout-glue-redshift-rds/并添加了连接,但我仍然收到错误。

Glue 脚本如下:

import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.job import Job
    
    args = getResolvedOptions(sys.argv, ["JOB_NAME"])
    sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    job = Job(glueContext)
    job.init(args["JOB_NAME"], args)
    
    # Script generated for node Redshift Cluster
    RedshiftCluster_node1 = glueContext.create_dynamic_frame.from_catalog(
        database="redshift_bbd",
        redshift_tmp_dir=args["TempDir"],
        table_name="financial_data",
        transformation_ctx="RedshiftCluster_node1",
    )
    
    # Script generated for node ApplyMapping
    ApplyMapping_node2 = ApplyMapping.apply(
        frame=RedshiftCluster_node1,
        mappings=[
            ("units_7d", "int", "units_7d", "int"),
            ("pcogs_total_13w", "decimal", "pcogs_total_13w", "decimal"),
            (
                "npp_contra_cogs_13w_total",
                "decimal",
                "npp_contra_cogs_13w_total",
                "decimal",
            ),
            ("revenue_7d", "decimal", "revenue_7d", "decimal"),
            ("asin", "string", "asin", "string"),
            ("netppm_4w", "decimal", "netppm_4w", "decimal"),
        ],
        transformation_ctx="ApplyMapping_node2",
    )
    
    # Script generated for node DynamoDB bucket
    Datasink1 = glueContext.write_dynamic_frame_from_options(
        frame=ApplyMapping_node2,
        connection_type="dynamodb",
        connection_options={
            "dynamodb.output.tableName": "FINANCIAL_DATA",
            "dynamodb.throughput.write.percent": "1.0"
        }
    )
    
    job.commit()
amazon-web-services amazon-s3 amazon-dynamodb amazon-redshift aws-glue
1个回答
1
投票

事实证明,我的 Glue 作业与 DynamoDB 之间没有连接,我为 S3 和 DynamoDB 添加了 VPC 端点(仅添加其中之一是不够的),我的作业成功了。 有关更多信息:

https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-ddb.html

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/vpc-endpoints-dynamodb-tutorial.html

© www.soinside.com 2019 - 2024. All rights reserved.