Athena (Presto)视图,具有完整的列类型。

问题描述 投票:0回答:1

我正试图创建一个使用 CloudFormation 管理的 Athena 视图。该视图包含嵌套记录属性的列表。

直接在 Athena 中运行 SELECT 可以正常工作。

SELECT
    item_id AS material_id,
    material_type AS material_type,
    material_group AS material_group,
    material_status AS x_plant_mat_stat,
    products[1].PRODUCT_NO AS product_nr,
    products[1].VERSION AS product_version,
    products[1].SUPPL_CHAIN_OWNERSHIP AS supply_chain_owner,
    products[1].DELETED_DATE AS global_deleted_date,
    transform(
        warehouses,
        plant -> CAST(ROW(
            plant.WAREHOUSE,
            plant.PLANT_SPECIFIC_MAT_STATUS,
            plant.PROCUREMENT_TYPE
        ) AS ROW(plant_id varchar, ps_material_stat varchar, proc_type varchar))
    ) AS plants
FROM raw_item_master LIMIT 5

但当我尝试按照CloudFormation的代码段进行操作时,

    View:
        Type: "AWS::Glue::Table"
        Properties:
            CatalogId: !Ref "AWS::AccountId"
            DatabaseName: !Ref "GlueDatabaseName"
            TableInput:
                TableType: "VIRTUAL_VIEW"
                Name: "item_master"
                Parameters:
                    presto_view: true
                StorageDescriptor:
                    SerdeInfo: {}
                    Columns:
                        -
                            Name: "material_id"
                            Type: "string"
                        -
                            Name: "material_type"
                            Type: "string"
                        -
                            Name: "material_group"
                            Type: "string"
                        -
                            Name: "x_plant_mat_stat"
                            Type: "string"
                        -
                            Name: "product_nr"
                            Type: "string"
                        -
                            Name: "product_version"
                            Type: "string"
                        -
                            Name: "supply_chain_owner"
                            Type: "string"
                        -
                            Name: "global_deleted_date"
                            Type: "string"
                        -
                            Name: "plants"
                            Type: "array<struct<plant_id:string,ps_material_stat:string,proc_type:string>>"
                ViewOriginalText:
                    "Fn::Sub":
                        - "/* Presto View: ${View} */"
                        -
                            View:
                                "Fn::Base64": !Sub '
                                    {
                                        "catalog": "awsdatacatalog",
                                        "schema": "${GlueDatabaseName}",
                                        "columns": [
                                            {
                                                "name": "material_id",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "material_type",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "material_group",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "x_plant_mat_stat",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "product_nr",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "product_version",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "supply_chain_owner",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "global_deleted_date",
                                                "type": "varchar"
                                            },
                                            {
                                                "name": "plants",
                                                "type": "array(row(plant_id varchar, ps_material_stat varchar, proc_type varchar))"
                                            }
                                        ],
                                        "originalSql": "SELECT
                                                item_id AS material_id,
                                                material_type AS material_type,
                                                material_group AS material_group,
                                                material_status AS x_plant_mat_stat,
                                                products[1].PRODUCT_NO AS product_nr,
                                                products[1].VERSION AS product_version,
                                                products[1].SUPPL_CHAIN_OWNERSHIP AS supply_chain_owner,
                                                products[1].DELETED_DATE AS global_deleted_date,
                                                transform(
                                                    warehouses,
                                                    plant -> CAST(ROW(
                                                        plant.WAREHOUSE,
                                                        plant.PLANT_SPECIFIC_MAT_STATUS,
                                                        plant.PROCUREMENT_TYPE
                                                    ) AS ROW(plant_id varchar, ps_material_stat varchar, proc_type varchar))
                                                ) AS plants
                                            FROM ${RawTable}"
                                    }'

我在Athena中得到以下错误。

INVALID_VIEW: Invalid view JSON: # here comes my JSON

然而,当我只选择一个属性时,工作正常(字段类型为 "type": "array(row(plant_id varchar))",转换为 CAST(ROW(plant.WAREHOUSE) AS ROW(plant_id varchar)). 视图可以使用任何属性,但只能使用一个属性 - 只要我添加两个属性,它就会在 Athena 中中断。

amazon-cloudformation presto amazon-athena
1个回答
0
投票

在从 Athena 创建视图并使用 aws glue get-table 我比较了我的输入和Athena的输出,唯一不同的是列定义中的空格。

我的输入(逗号后有空格)。

"type": "array(row(plant_id varchar, ps_material_stat varchar, proc_type varchar))"

雅典娜(没有空格):

"type": "array(row(plant_id varchar,ps_material_stat varchar,proc_type varchar))"

去掉空格后就可以了

© www.soinside.com 2019 - 2024. All rights reserved.