我在 AWS Redshift Spectrum 中有一个表,其中包含名为“数据”的列。
“Data”中的每个单元格都包含一个 JSON 对象数组。单个数据单元格可能如下所示(这是从源中检索的),将有多行:
列名称:数据
[
{"id":1,"group":{"group_id":3,"coordinates":[23.1,23.5]}},
{"id":2,"group":{"group_id":2,"coordinates":[25.1,25.5]}},
{"id":5,"group":{"group_id":3,"coordinates":[24.1,24.5]}}
]
我想查询数据列以获取所有这些信息,每个 id 保留一行。
以下作品:
select
x.id,
x.group.group_id
from table_name a, a.data as x
返回包含“id”和“group_id”列的表。
以下是我想要做的事情:
select x.id,
x.group.group_id,
x.group.coordinates[0] as lattitude,
x.group.coordinates[1] as longitude
from table_name a, a.data as x
这会返回错误:
ARRAY type "x.group.coordinates" can only occur in the FROM or the SELECT clause
我尝试仅查询“组”作为排序字符串(从...中选择 x.group),返回此错误:
Struct type "x.group" cannot be accessed directly. Hint: Use dot notation to access specific attributes of the struct.
我可以交叉连接坐标以将它们分成单独的行,但随后我需要将其旋转出来并可能丢失正确的顺序(这对我来说似乎也很倒退)?
select
x.id,
x.group.group_id,
y
from table_name a, a.data as x, x.group.coordinates y
或者,我可以子查询坐标数组,但这仅在“from”语句中才有效,并且我不知道如何获取第一个和最后一个值(而不是 max 和 min ,这并不总是有效)
SELECT
x.id
,(select min(coord) from x.group.coordinates coord) as lattitude
,(select max(coord) from x.group.coordinates coord) as lattitude
from table_name a, a.data as x
请问有什么想法吗?感觉应该很简单。
AWS Redshift 文档对此进行了介绍 - https://docs.aws.amazon.com/redshift/latest/dg/query-super.html
有一个取消嵌套多级数组的示例,应该可以为您提供所需的信息。看起来像:
CREATE TABLE foo AS SELECT json_parse('[[1.1, 1.2], [2.1, 2.2], [3.1, 3.2]]') AS multi_level_array;
SELECT array, element FROM foo AS f, f.multi_level_array AS array, array AS element;
array | element
-----------+---------
[1.1,1.2] | 1.1
[1.1,1.2] | 1.2
[2.1,2.2] | 2.1
[2.1,2.2] | 2.2
[3.1,3.2] | 3.1
[3.1,3.2] | 3.2
(6 rows)