使用 AWS CLI,如何更新 Glue 列的数据类型?

问题描述 投票:0回答:1

我知道我可以轻松使用 AWS Glue 控制台来执行此操作,但我只是尝试通过 AWS CLI 来执行此操作。因此,我有一个

my_table_name
表,其中包含当前类型为
id
string
列。但是,我想将类型更改为
bigint

我当前的尝试是下面的代码。首先,我从

tableinput
得到
get-table
并将第三列 (
id
) 更改为
bigint
。然后,我使用修改后的
tableinput
更新粘合表,如下所示:

#!/bin/bash
tableinput=$( aws glue get-table \
                        --database-name $databasename \
                        --name $tablename \
                        | json Table \
                        | json -e "this.StorageDescriptor.Columns[2].Type='bigint'" )
aws glue update-table \
    --database-name $databasename \
    --name $tablename \
    --table-input $tableinput

作为参考,

echo tableinput
给我这个JSON:

{ "Name": "my_table_name", "DatabaseName": "my_database_name", "CreateTime": "my_date", "UpdateTime": "my_date", "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "kind", "Type": "string" }, { "Name": "etag", "Type": "string" }, { "Name": "id", "Type": "bigint" }, { "Name": "snippet_channelid", "Type": "string" }, { "Name": "snippet_title", "Type": "string" }, { "Name": "snippet_assignable", "Type": "boolean" } ], "Location": "my_location", "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", "Compressed": true, "NumberOfBuckets": -1, "SerdeInfo": { "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe", "Parameters": { "serialization.format": "1" } }, "BucketColumns": [], "SortColumns": [], "Parameters": { "CrawlerSchemaDeserializerVersion": "1.0", "classification": "parquet", "compressionType": "snappy", "typeOfData": "file" }, "StoredAsSubDirectories": false }, "PartitionKeys": [], "TableType": "EXTERNAL_TABLE", "Parameters": { "classification": "parquet", "compressionType": "snappy", "projection.enabled": "false", "typeOfData": "file" }, "CreatedBy": "my_role", "IsRegisteredWithLakeFormation": false, "CatalogId": "my_catalog_id", "VersionId": "0" }

但是,我收到此错误:

Unknown options: --name, "Name":, "my_table_name",, "DatabaseName":, "my_database_name",, "CreateTime":, "my_date",, "UpdateTime":, "my_date",, "Retention":, 0,, "StorageDescriptor":, {, "Columns":, [, {, "Name":, "kind",, "Type":, "string", },, {, "Name":, "etag",, "Type":, "string", },, {, "Name":, "id",, "Type":, "bigint", },, {, "Name":, "snippet_channelid",, "Type":, "string", },, {, "Name":, "snippet_title",, "Type":, "string", },, {, "Name":, "snippet_assignable",, "Type":, "boolean", }, ],, "Location":, "s3://my_location",, "InputFormat":, "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",, "OutputFormat":, "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",, "Compressed":, true,, "NumberOfBuckets":, -1,, "SerdeInfo":, {, "SerializationLibrary":, "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",, "Parameters":, {, "serialization.format":, "1", }, },, "BucketColumns":, [],, "SortColumns":, [],, "Parameters":, {, "CrawlerSchemaDeserializerVersion":, "1.0",, "classification":, "parquet",, "compressionType":, "snappy",, "typeOfData":, "file", },, "StoredAsSubDirectories":, false, },, "PartitionKeys":, [],, "TableType":, "EXTERNAL_TABLE",, "Parameters":, {, "classification":, "parquet",, "compressionType":, "snappy",, "projection.enabled":, "false",, "typeOfData":, "file", },, "CreatedBy":, "my_role",, "IsRegisteredWithLakeFormation":, false,, "CatalogId":, "my_catalog_id",, "VersionId":, "0", }, my_table_name

--name
中删除
update-table
选项让我
aws.exe: error: the following arguments are required: --name

amazon-web-services aws-cli aws-glue
1个回答
0
投票

首先,您需要通过删除错误消息中提到的所有列来重塑

get-table
的输出:

tableinput=$( aws glue get-table \
                        --database-name $databasename \
                        --name $tablename \
                        | jq -r ".Table" \
                        | jq "del(.DatabaseName,.CreateTime,.UpdateTime,.CreatedBy,.IsRegisteredWithLakeFormation,.CatalogId,.VersionId)"
                        | json -e "this.StorageDescriptor.Columns[2].Type='bigint'" )

然后你可以停止

--name
,因为 json 现在是有效的,它不会抱怨:

aws glue update-table \
    --database-name $databasename \
    --table-input $tableinput
© www.soinside.com 2019 - 2024. All rights reserved.