如何使用结构数据类型映射更新 Azure Databricks 中增量表列中的值?

问题描述 投票:0回答:1

我在 Databricks 中有一个名为

prod.silver.control_table
的增量表。它有几列,包括具有字符串数据类型的
table_name
和具有以下结构的
transform_options

 |-- transform_options: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- col_name_mappings: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- type_mappings: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- partition_duplicates_by: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)
 |    |    |-- order_duplicates_by: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)

例如,当

table_name
prod.silver.weather
时,
transform_options
为:

{
"prod.bronze.weather_source_a":{"col_name_mappings":{"col_a_old":"col_a_new","col_b_old":"col_b_new"},"type_mappings":{"col_a_new":"INT","col_b_new":"TIMESTAMP"},"partition_duplicates_by":["col_a_new"],"order_duplicates_by":["_commit_version"]},
"prod.bronze.weather_source_b":{"col_name_mappings":{"col_c_old":"col_c_new","col_d_old":"col_d_new"},"type_mappings":{"col_c_new":"INT","col_d_new":"TIMESTAMP"},"partition_duplicates_by":["col_c_new"],"order_duplicates_by":["ingestion_timestamp","_commit_version"]}
}

我需要更新

order_duplicates_by
中的值。我需要通过删除最初的下划线将
_commit_version
更改为
commit_version

知道如何更新表值吗?我更喜欢使用 SQL 语言。

arrays struct azure-databricks
1个回答
0
投票

您可以使用下面的代码来执行此类转换。

SELECT 
    table_name,
    transform_values(transform_options, (k, v) -> 
    case
    when array_contains(v.order_duplicates_by, '_commit_version') then
    STRUCT(v.col_name_mappings, v.type_mappings, v.partition_duplicates_by, array_append(array_remove(v.order_duplicates_by, '_commit_version'), 'commit_version') as order_duplicates_by)
    else
     STRUCT(v.col_name_mappings, v.type_mappings, v.partition_duplicates_by, v.order_duplicates_by)
    end
    ) AS updated_transform_options
FROM control_table

在这里,使用

transform_values
函数,我获取地图键和值并检查
_commit_version
是否存在。 如果存在,则删除并附加所需的字符串;否则,保留旧值。

输出:

表名 更新的_transform_options
产品银.天气 {"prod.bronze.weather_source_a":{"col_name_mappings":{"col_a_old":"col_a_new","col_b_old":"col_b_new"},"type_mappings":{"col_a_new":"INT","col_b_new": "TIMESTAMP"},"partition_duplicates_by":["col_a_new"],"order_duplicates_by":["commit_version"]}}
prod.silver.other_table {"prod.bronze.weather_source_b":{"col_name_mappings":{"col_x_old":"col_x_new","col_y_old":"col_y_new"},"type_mappings":{"col_x_new":"STRING","col_y_new": "DATE"},"partition_duplicates_by":["col_x_new"],"order_duplicates_by":["ingestion_timestamp","commit_version"]}}
© www.soinside.com 2019 - 2024. All rights reserved.