我正试图创建一个使用 CloudFormation 管理的 Athena 视图。该视图包含嵌套记录属性的列表。
直接在 Athena 中运行 SELECT 可以正常工作。
SELECT
item_id AS material_id,
material_type AS material_type,
material_group AS material_group,
material_status AS x_plant_mat_stat,
products[1].PRODUCT_NO AS product_nr,
products[1].VERSION AS product_version,
products[1].SUPPL_CHAIN_OWNERSHIP AS supply_chain_owner,
products[1].DELETED_DATE AS global_deleted_date,
transform(
warehouses,
plant -> CAST(ROW(
plant.WAREHOUSE,
plant.PLANT_SPECIFIC_MAT_STATUS,
plant.PROCUREMENT_TYPE
) AS ROW(plant_id varchar, ps_material_stat varchar, proc_type varchar))
) AS plants
FROM raw_item_master LIMIT 5
但当我尝试按照CloudFormation的代码段进行操作时,
View:
Type: "AWS::Glue::Table"
Properties:
CatalogId: !Ref "AWS::AccountId"
DatabaseName: !Ref "GlueDatabaseName"
TableInput:
TableType: "VIRTUAL_VIEW"
Name: "item_master"
Parameters:
presto_view: true
StorageDescriptor:
SerdeInfo: {}
Columns:
-
Name: "material_id"
Type: "string"
-
Name: "material_type"
Type: "string"
-
Name: "material_group"
Type: "string"
-
Name: "x_plant_mat_stat"
Type: "string"
-
Name: "product_nr"
Type: "string"
-
Name: "product_version"
Type: "string"
-
Name: "supply_chain_owner"
Type: "string"
-
Name: "global_deleted_date"
Type: "string"
-
Name: "plants"
Type: "array<struct<plant_id:string,ps_material_stat:string,proc_type:string>>"
ViewOriginalText:
"Fn::Sub":
- "/* Presto View: ${View} */"
-
View:
"Fn::Base64": !Sub '
{
"catalog": "awsdatacatalog",
"schema": "${GlueDatabaseName}",
"columns": [
{
"name": "material_id",
"type": "varchar"
},
{
"name": "material_type",
"type": "varchar"
},
{
"name": "material_group",
"type": "varchar"
},
{
"name": "x_plant_mat_stat",
"type": "varchar"
},
{
"name": "product_nr",
"type": "varchar"
},
{
"name": "product_version",
"type": "varchar"
},
{
"name": "supply_chain_owner",
"type": "varchar"
},
{
"name": "global_deleted_date",
"type": "varchar"
},
{
"name": "plants",
"type": "array(row(plant_id varchar, ps_material_stat varchar, proc_type varchar))"
}
],
"originalSql": "SELECT
item_id AS material_id,
material_type AS material_type,
material_group AS material_group,
material_status AS x_plant_mat_stat,
products[1].PRODUCT_NO AS product_nr,
products[1].VERSION AS product_version,
products[1].SUPPL_CHAIN_OWNERSHIP AS supply_chain_owner,
products[1].DELETED_DATE AS global_deleted_date,
transform(
warehouses,
plant -> CAST(ROW(
plant.WAREHOUSE,
plant.PLANT_SPECIFIC_MAT_STATUS,
plant.PROCUREMENT_TYPE
) AS ROW(plant_id varchar, ps_material_stat varchar, proc_type varchar))
) AS plants
FROM ${RawTable}"
}'
我在Athena中得到以下错误。
INVALID_VIEW: Invalid view JSON: # here comes my JSON
然而,当我只选择一个属性时,工作正常(字段类型为 "type": "array(row(plant_id varchar))"
,转换为 CAST(ROW(plant.WAREHOUSE) AS ROW(plant_id varchar))
. 视图可以使用任何属性,但只能使用一个属性 - 只要我添加两个属性,它就会在 Athena 中中断。
在从 Athena 创建视图并使用 aws glue get-table
我比较了我的输入和Athena的输出,唯一不同的是列定义中的空格。
我的输入(逗号后有空格)。
"type": "array(row(plant_id varchar, ps_material_stat varchar, proc_type varchar))"
雅典娜(没有空格):
"type": "array(row(plant_id varchar,ps_material_stat varchar,proc_type varchar))"
去掉空格后就可以了