我知道我可以轻松使用 AWS Glue 控制台来执行此操作,但我只是尝试通过 AWS CLI 来执行此操作。因此,我有一个
my_table_name
表,其中包含当前类型为 id
的 string
列。但是,我想将类型更改为bigint
。
我当前的尝试是下面的代码。首先,我从
tableinput
得到 get-table
并将第三列 (id
) 更改为 bigint
。然后,我使用修改后的 tableinput
更新粘合表,如下所示:
#!/bin/bash
tableinput=$( aws glue get-table \
--database-name $databasename \
--name $tablename \
| json Table \
| json -e "this.StorageDescriptor.Columns[2].Type='bigint'" )
aws glue update-table \
--database-name $databasename \
--name $tablename \
--table-input $tableinput
作为参考,
echo tableinput
给我这个JSON:
{ "Name": "my_table_name", "DatabaseName": "my_database_name", "CreateTime": "my_date", "UpdateTime": "my_date", "Retention": 0, "StorageDescriptor": { "Columns": [ { "Name": "kind", "Type": "string" }, { "Name": "etag", "Type": "string" }, { "Name": "id", "Type": "bigint" }, { "Name": "snippet_channelid", "Type": "string" }, { "Name": "snippet_title", "Type": "string" }, { "Name": "snippet_assignable", "Type": "boolean" } ], "Location": "my_location", "InputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", "OutputFormat": "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", "Compressed": true, "NumberOfBuckets": -1, "SerdeInfo": { "SerializationLibrary": "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe", "Parameters": { "serialization.format": "1" } }, "BucketColumns": [], "SortColumns": [], "Parameters": { "CrawlerSchemaDeserializerVersion": "1.0", "classification": "parquet", "compressionType": "snappy", "typeOfData": "file" }, "StoredAsSubDirectories": false }, "PartitionKeys": [], "TableType": "EXTERNAL_TABLE", "Parameters": { "classification": "parquet", "compressionType": "snappy", "projection.enabled": "false", "typeOfData": "file" }, "CreatedBy": "my_role", "IsRegisteredWithLakeFormation": false, "CatalogId": "my_catalog_id", "VersionId": "0" }
但是,我收到此错误:
Unknown options: --name, "Name":, "my_table_name",, "DatabaseName":, "my_database_name",, "CreateTime":, "my_date",, "UpdateTime":, "my_date",, "Retention":, 0,, "StorageDescriptor":, {, "Columns":, [, {, "Name":, "kind",, "Type":, "string", },, {, "Name":, "etag",, "Type":, "string", },, {, "Name":, "id",, "Type":, "bigint", },, {, "Name":, "snippet_channelid",, "Type":, "string", },, {, "Name":, "snippet_title",, "Type":, "string", },, {, "Name":, "snippet_assignable",, "Type":, "boolean", }, ],, "Location":, "s3://my_location",, "InputFormat":, "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",, "OutputFormat":, "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",, "Compressed":, true,, "NumberOfBuckets":, -1,, "SerdeInfo":, {, "SerializationLibrary":, "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",, "Parameters":, {, "serialization.format":, "1", }, },, "BucketColumns":, [],, "SortColumns":, [],, "Parameters":, {, "CrawlerSchemaDeserializerVersion":, "1.0",, "classification":, "parquet",, "compressionType":, "snappy",, "typeOfData":, "file", },, "StoredAsSubDirectories":, false, },, "PartitionKeys":, [],, "TableType":, "EXTERNAL_TABLE",, "Parameters":, {, "classification":, "parquet",, "compressionType":, "snappy",, "projection.enabled":, "false",, "typeOfData":, "file", },, "CreatedBy":, "my_role",, "IsRegisteredWithLakeFormation":, false,, "CatalogId":, "my_catalog_id",, "VersionId":, "0", }, my_table_name
从
--name
中删除 update-table
选项让我 aws.exe: error: the following arguments are required: --name
首先,您需要通过删除错误消息中提到的所有列来重塑
get-table
的输出:
tableinput=$( aws glue get-table \
--database-name $databasename \
--name $tablename \
| jq -r ".Table" \
| jq "del(.DatabaseName,.CreateTime,.UpdateTime,.CreatedBy,.IsRegisteredWithLakeFormation,.CatalogId,.VersionId)"
| json -e "this.StorageDescriptor.Columns[2].Type='bigint'" )
然后你可以停止
--name
,因为 json 现在是有效的,它不会抱怨:
aws glue update-table \
--database-name $databasename \
--table-input $tableinput