下面是我在 databricks SQL 编辑器中运行的 SQL 查询:
SELECT
orders.GroceryStore,
TO_JSON(COLLECT_LIST(MAP('CustomerID', orders.CustomerID,'DiscountValue', orders.DiscountValue,'SalesAmount', orders.SalesAmount,'ChargesInfo', nested_json.ChargeDetails))) AS JsonLine
FROM (
SELECT h.GroceryStore, d.CustomerID, SUM(d.DiscountValue) AS DiscountValue, SUM(d.SalesAmount) AS SalesAmount
FROM SalesHeader h
LEFT JOIN SalesDetail d ON h.CustomerID = d.CustomerID
WHERE h.Date >= '2024-01-01'
GROUP BY h.GroceryStore, d.CustomerID
) AS orders
LEFT JOIN (
SELECT
d.CustomerID,
TO_JSON(COLLECT_LIST(MAP(
'ChargeType', d.ChargeType,
'ChargeAmount', d.ChargeAmount,
'PaidAmount', d.PaidAmount
))) AS ChargeDetails
FROM ChargesDetail d
GROUP BY d.CustomerID
) AS nested_json ON orders.CustomerID = nested_json.CustomerID
GROUP BY orders.GroceryStore;
根据输出,
\
中的字段有转义字符ChargeInfo
。无论如何,我可以修改 SQL 查询以使输出不包含转义字符吗?期望的输出是:
[{"CustomerID":"0001ABC","DiscountValue":"126.33","SalesAmount":"2320.26","ChargesInfo":[{"ChargeType":"01","ChargeAmount":"20.26","PaidAmount":"11.22"}]}]
请注意,期望输出中的
""
数组也没有 ChargeInfo
。
任何帮助或建议将不胜感激!
我已经更正了内容语法:
样本数据:
[{"CustomerID":"0001ABC","DiscountValue":"26.25","SalesAmount":"300.0","ChargesInfo":"[{\"ChargeType\":\"01\",\"ChargeAmount\":\"10.26\",\"PaidAmount\":\"5.62\"}]"}]]
[{"CustomerID":"0002XYZ","DiscountValue":"5.25","SalesAmount":"150.0","ChargesInfo":"[{\"ChargeType\":\"02\",\"ChargeAmount\":\"20.5\",\"PaidAmount\":\"15.75\"}]"}]]
我尝试过以下方法:
SELECT
orders.GroceryStore,
CONCAT(
'[',
CONCAT_WS(
',',
COLLECT_LIST(
CONCAT(
'{"CustomerID":"', orders.CustomerID, '"',
',"DiscountValue":"', orders.DiscountValue, '"',
',"SalesAmount":"', orders.SalesAmount, '"',
',"ChargesInfo":', REPLACE(nested_json.ChargeDetails, '\\\\"', '"'),
'}'
)
)
),
']'
) AS JsonLine
FROM (
SELECT
h.GroceryStore,
d.CustomerID,
SUM(d.DiscountValue) AS DiscountValue,
SUM(d.SalesAmount) AS SalesAmount
FROM SalesHeader h
LEFT JOIN SalesDetail d ON h.CustomerID = d.CustomerID
WHERE h.Date >= '2024-01-01'
GROUP BY h.GroceryStore, d.CustomerID
) AS orders
LEFT JOIN (
SELECT
d.CustomerID,
CONCAT(
'[',
CONCAT_WS(
',',
COLLECT_LIST(
CONCAT(
'{"ChargeType":"', d.ChargeType, '"',
',"ChargeAmount":"', d.ChargeAmount, '"',
',"PaidAmount":"', d.PaidAmount, '"',
'}'
)
)
),
']'
) AS ChargeDetails
FROM ChargesDetail d
GROUP BY d.CustomerID
) AS nested_json ON orders.CustomerID = nested_json.CustomerID
GROUP BY orders.GroceryStore;
REPLACE
函数用于将\"序列(转义双引号)替换为",有效地删除转义字符\。
结果:
[{"CustomerID":"0001ABC","DiscountValue":"26.25","SalesAmount":"300.0","ChargesInfo":[{"ChargeType":"01","ChargeAmount":"10.26","PaidAmount":"5.62"}]}]
[{"CustomerID":"0002XYZ","DiscountValue":"5.25","SalesAmount":"150.0","ChargesInfo":[{"ChargeType":"02","ChargeAmount":"20.5","PaidAmount":"15.75"}]}]