无法查询 Redshift Struct 中的嵌套数组

Question

我在 AWS Redshift Spectrum 中有一个表，其中包含名为“数据”的列。

“Data”中的每个单元格都包含一个 JSON 对象数组。单个数据单元格可能如下所示（这是从源中检索的），将有多行：

列名称：数据

[
{"id":1,"group":{"group_id":3,"coordinates":[23.1,23.5]}},
{"id":2,"group":{"group_id":2,"coordinates":[25.1,25.5]}},
{"id":5,"group":{"group_id":3,"coordinates":[24.1,24.5]}}
]

我想查询数据列以获取所有这些信息，每个 id 保留一行。

以下作品：

select 
x.id, 
x.group.group_id
from table_name a, a.data as x

返回包含“id”和“group_id”列的表。

以下是我想要做的事情：

select x.id, 
x.group.group_id, 
x.group.coordinates[0] as lattitude, 
x.group.coordinates[1] as longitude
from table_name a, a.data as x

这会返回错误：

ARRAY type "x.group.coordinates" can only occur in the FROM or the SELECT clause

我尝试仅查询“组”作为排序字符串（从...中选择 x.group），返回此错误：

Struct type "x.group" cannot be accessed directly. Hint: Use dot notation to access specific attributes of the struct.

我可以交叉连接坐标以将它们分成单独的行，但随后我需要将其旋转出来并可能丢失正确的顺序（这对我来说似乎也很倒退）？

select 
x.id, 
x.group.group_id,
y

from table_name a, a.data as x, x.group.coordinates y

或者，我可以子查询坐标数组，但这仅在“from”语句中才有效，并且我不知道如何获取第一个和最后一个值（而不是 max 和 min ，这并不总是有效）

SELECT 
    x.id
    ,(select min(coord) from x.group.coordinates coord) as lattitude
    ,(select max(coord) from x.group.coordinates coord) as lattitude
    from table_name a, a.data as x

请问有什么想法吗？感觉应该很简单。

Answer 1

AWS Redshift 文档对此进行了介绍 - https://docs.aws.amazon.com/redshift/latest/dg/query-super.html

有一个取消嵌套多级数组的示例，应该可以为您提供所需的信息。看起来像：

CREATE TABLE foo AS SELECT json_parse('[[1.1, 1.2], [2.1, 2.2], [3.1, 3.2]]') AS multi_level_array;

SELECT array, element FROM foo AS f, f.multi_level_array AS array, array AS element;

   array   | element
-----------+---------
 [1.1,1.2] | 1.1
 [1.1,1.2] | 1.2
 [2.1,2.2] | 2.1
 [2.1,2.2] | 2.2
 [3.1,3.2] | 3.1
 [3.1,3.2] | 3.2
(6 rows)

无法查询 Redshift Struct 中的嵌套数组

问题描述投票：0回答：1

1个回答

最新问题

无法查询 Redshift Struct 中的嵌套数组

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1