使用 AWS CLI，如何更新 Glue 列的数据类型？

Question

我知道我可以轻松使用 AWS Glue 控制台来执行此操作，但我只是尝试通过 AWS CLI 来执行此操作。因此，我有一个

my_table_name

表，其中包含当前类型为

id

的

string

列。但是，我想将类型更改为

bigint

。

我当前的尝试是下面的代码。首先，我从

tableinput

得到

get-table

并将第三列 (

id

) 更改为

bigint

。然后，我使用修改后的

tableinput

更新粘合表，如下所示：

#!/bin/bash
tableinput=$( aws glue get-table \
                        --database-name $databasename \
                        --name $tablename \
                        | json Table \
                        | json -e "this.StorageDescriptor.Columns[2].Type='bigint'" )
aws glue update-table \
    --database-name $databasename \
    --name $tablename \
    --table-input $tableinput

作为参考，

echo tableinput

给我这个JSON：

{ "Name": "my_table_name", "DatabaseName": "my_database_name", "CreateTime": "my_date", "UpdateTime": "my_date", "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "kind", "Type": "string" }, { "Name": "etag", "Type": "string" }, { "Name": "id", "Type": "bigint" }, { "Name": "snippet_channelid", "Type": "string" }, { "Name": "snippet_title", "Type": "string" }, { "Name": "snippet_assignable", "Type": "boolean" } ], "Location": "my_location", "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", "Compressed": true, "NumberOfBuckets": -1, "SerdeInfo": { "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe", "Parameters": { "serialization.format": "1" } }, "BucketColumns": [], "SortColumns": [], "Parameters": { "CrawlerSchemaDeserializerVersion": "1.0", "classification": "parquet", "compressionType": "snappy", "typeOfData": "file" }, "StoredAsSubDirectories": false }, "PartitionKeys": [], "TableType": "EXTERNAL_TABLE", "Parameters": { "classification": "parquet", "compressionType": "snappy", "projection.enabled": "false", "typeOfData": "file" }, "CreatedBy": "my_role", "IsRegisteredWithLakeFormation": false, "CatalogId": "my_catalog_id", "VersionId": "0" }

但是，我收到此错误：

Unknown options: --name, "Name":, "my_table_name",, "DatabaseName":, "my_database_name",, "CreateTime":, "my_date",, "UpdateTime":, "my_date",, "Retention":, 0,, "StorageDescriptor":, {, "Columns":, [, {, "Name":, "kind",, "Type":, "string", },, {, "Name":, "etag",, "Type":, "string", },, {, "Name":, "id",, "Type":, "bigint", },, {, "Name":, "snippet_channelid",, "Type":, "string", },, {, "Name":, "snippet_title",, "Type":, "string", },, {, "Name":, "snippet_assignable",, "Type":, "boolean", }, ],, "Location":, "s3://my_location",, "InputFormat":, "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",, "OutputFormat":, "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",, "Compressed":, true,, "NumberOfBuckets":, -1,, "SerdeInfo":, {, "SerializationLibrary":, "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",, "Parameters":, {, "serialization.format":, "1", }, },, "BucketColumns":, [],, "SortColumns":, [],, "Parameters":, {, "CrawlerSchemaDeserializerVersion":, "1.0",, "classification":, "parquet",, "compressionType":, "snappy",, "typeOfData":, "file", },, "StoredAsSubDirectories":, false, },, "PartitionKeys":, [],, "TableType":, "EXTERNAL_TABLE",, "Parameters":, {, "classification":, "parquet",, "compressionType":, "snappy",, "projection.enabled":, "false",, "typeOfData":, "file", },, "CreatedBy":, "my_role",, "IsRegisteredWithLakeFormation":, false,, "CatalogId":, "my_catalog_id",, "VersionId":, "0", }, my_table_name

从

--name

中删除

update-table

选项让我

aws.exe: error: the following arguments are required: --name

Answer 1

首先，您需要通过删除错误消息中提到的所有列来重塑

get-table

的输出：

tableinput=$( aws glue get-table \
                        --database-name $databasename \
                        --name $tablename \
                        | jq -r ".Table" \
                        | jq "del(.DatabaseName,.CreateTime,.UpdateTime,.CreatedBy,.IsRegisteredWithLakeFormation,.CatalogId,.VersionId)"
                        | json -e "this.StorageDescriptor.Columns[2].Type='bigint'" )

然后你可以停止

--name

，因为 json 现在是有效的，它不会抱怨：

aws glue update-table \
    --database-name $databasename \
    --table-input $tableinput

使用 AWS CLI，如何更新 Glue 列的数据类型？

问题描述投票：0回答：1

1个回答

最新问题

使用 AWS CLI，如何更新 Glue 列的数据类型？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1