我在 Databricks 中有一个名为
prod.silver.platform_ctl_ingestion
的增量表。它有几列,包括具有字符串数据类型的 table_name
和具有以下结构的 transform_options
:
|-- transform_options: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- col_name_mappings: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- type_mappings: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
| | |-- partition_duplicates_by: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- order_duplicates_by: array (nullable = true)
| | | |-- element: string (containsNull = true)
例如,当
table_name
为prod.silver.weather
时,transform_options
为:
{
"prod.bronze.weather_source_a":{"col_name_mappings":{"col_a_old":"col_a_new","col_b_old":"col_b_new"},"type_mappings":{"col_a_new":"INT","col_b_new":"TIMESTAMP"},"partition_duplicates_by":["col_a_new"],"order_duplicates_by":["_commit_version"]},
"prod.bronze.weather_source_b":{"col_name_mappings":{"col_c_old":"col_c_new","col_d_old":"col_d_new"},"type_mappings":{"col_c_new":"INT","col_d_new":"TIMESTAMP"},"partition_duplicates_by":["col_c_new"],"order_duplicates_by":["ingestion_timestamp","_commit_version"]}
}
我想选择
order_duplicates_by
中 transform_options
包含 _commit_version
的行。
我尝试了以下查询,但它返回所有行。
SELECT
*,
transform_values(transform_options, (k, v) ->
CASE
WHEN array_contains(v.order_duplicates_by, '_commit_version') THEN
STRUCT(v.col_name_mappings, v.type_mappings, v.partition_duplicates_by, v.order_duplicates_by)
ELSE NULL
END
) AS updated_transform_options
FROM prod.silver.platform_ctl_ingestion;
知道该怎么做吗?
如果您正在过滤,那么很容易删除数组中的
_commit_version
。
命令:
%sql
SELECT
*,
transform_values(transform_options, (k, v) ->
STRUCT(v.col_name_mappings, v.type_mappings, v.partition_duplicates_by, array_append(array_remove(v.order_duplicates_by, '_commit_version'), 'commit_version') as order_duplicates_by)
) AS updated_transform_options
FROM control_table
WHERE array_contains(map_values(transform_options)[0]['order_duplicates_by'],'_commit_version');
无需检查
_commit_version
中的transform_values
。