Spark SQL 查询将字符串列添加到结构数组

问题描述 投票:0回答:1

我正在研究 Adobe Experience Platform 中的查询服务。它使用有限的 Spark SQL 函数,here列出。

我有下表

Name   AddressType    CustomerDetails
------------------------------------------------------------------------------------------      
John   home           [{"acctType":"Mortgage loan","acctID":101},{"acctType":"Home Equity loan","acctID":104},{"acctType":"Checking Account","acctID":105,}]
John   work           [{"acctType":"Mortgage loan","acctID":101},{"acctType":"Home Equity loan","acctID":104},{"acctType":"Checking Account","acctID":105,}]
John   office         [{"acctType":"Mortgage loan","acctID":101},{"acctType":"Home Equity loan","acctID":104},{"acctType":"Checking Account","acctID":105,}]

这里

Name
AddressType
String
,其中
CustomerDetails
具有模式类型
array<struct<acctType:string, acctID:string>>

我需要将

AddressType
列添加到
CustomerDetails
列,最终输出应类似于以下格式

Name   AddressType    CustomerDetails
--------------------------------------      
John   home           [{"AddressType":"home","acctType":"Mortgage loan","acctID":101}, {"AddressType":"home","acctType":"Home Equity loan","acctID":104}, {"AddressType":"home","acctType":"Checking Account","acctID":105}]
John   work           [{"AddressType":"work","acctType":"Mortgage loan","acctID":101}, {"AddressType":"work","acctType":"Home Equity loan","acctID":104}, {"AddressType":"work","acctType":"Checking Account","acctID":105}]
John   office         [{"AddressType":"office","acctType":"Mortgage loan","acctID":101}, {"AddressType":"office","acctType":"Home Equity loan","acctID":104}, {"AddressType":"office","acctType":"Checking Account","acctID":105}]

我使用以下查询向

CustomerDetails
列添加额外字段。但我无法弄清楚如何将
AddressType
列的值添加到
col3

SELECT Name, addressType,
(from_json(CustomerDetails, 'ARRAY<STRUCT<addressType: STRING, acctType: STRING, acctID: STRING>>')) AS col3
FROM CustomerTable 

我还查看了 TRANSFORM 函数,但没有获得使用它的 SQL 特定语法。 任何对此的帮助将不胜感激。

arrays apache-spark-sql
1个回答
0
投票

经过一番挖掘后,我终于能够使用这个答案作为参考并提出解决方案。以下是对我有用的查询。

SELECT ct1.Name, ct1.addressType, to_json(ct1.col2) AS CustomerDetails
FROM
(SELECT ct.Name, ct.addressType,
TRANSFORM(ct.col3, x -> struct(ct.addressType as addressType, x.acctType as acctType, x.acctID as acctID)) as col2
FROM
(SELECT Name, addressType,
(from_json(CustomerDetails, 'ARRAY<STRUCT<addressType: STRING, acctType: STRING, acctID: STRING>>')) AS col3
FROM CustomerTable) ct) ct1
© www.soinside.com 2019 - 2024. All rights reserved.