我无法确定 Snowflake 是否支持
select
子句中的相关子查询,因为我遇到了相互矛盾的证据。
Snowflake querying-subqueries 文档说
where
子句支持相关子查询,似乎暗示 select
子句不支持它们。 here 和 here 等社区讨论似乎证实了这一点。
然而...
以Chinook 数据库为模型,这条在
select
子句中带有相关子查询的SQL 语句有效。
-- Three levels: artist, album, track
-- Correlated sub-query: level 2 -> level 3
-- Works: ✓
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*
from track
where track.albumid = album.albumid) y
) tracks
from album
where album.artistid = 1) x
) albums
from
artist
where artistid = 1) x;
这个也是……
-- Three levels: artist, album, track
-- Correlated sub-query: level 1 -> level 2
-- Works: ✓
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*
from track
where track.albumid = 1) y
) tracks
from album
where album.artistid = artist.artistid) x
) albums
from
artist
where artistid = 1) x;
然而,这并不...
-- Three levels: artist, album, track
-- Correlated sub-query: level 1 -> level 2 -> level 3
-- Doesn't work: ✗
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*
from track
where track.albumid = album.albumid) y
) tracks
from album
where album.artistid = artist.artistid) x
) albums
from
artist
where artistid = 1) x;
Error: SQL compilation error:
Unsupported subquery type cannot be evaluated
就好像相关子查询 are 在
select
子句中得到半途支持,只要引用不超过两个级别,它们就可以工作。然而,即使是这种有限的支持也与社区论坛和 StackOverflow 上的文档和传统智慧相矛盾。
我尝试了
select
子句中的相关子查询,涉及两层嵌套和三层嵌套。我期望的是它们都不起作用,尽管我也接受它们都起作用。我没想到的是其中一些(2 个级别)可以工作,而另一些(3 个级别)则不能。
附录
您有多个具有相同别名的子查询——我会尝试使用不同的名称,然后看看相关性是否更好。
这会引发上述错误:
-- Three levels: artist, album, track
-- Correlated sub-query: level 1 -> level 2 -> level 3
-- Doesn't work: ✗
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*
from track t3
where t3.albumid = t2.albumid) y
) tracks
from album t2
where t2.artistid = t1.artistid) x
) albums
from
artist t1
where artistid = 1);
你能为这两种情况发布 EXPLAIN USING TABULAR
吗?
-- Three levels: artist, album, track
-- Correlated sub-query: level 1 -> level 2
-- Works: ✓
explain using tabular
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*
from track t3
where t3.albumid = 1) y
) tracks
from album t2
where t2.artistid = t1.artistid) x
) albums
from
artist t1
where artistid = 1) x;
step | id | parent | operation | objects | alias | expressions | partitionstotal | partitionsassigned | bytesassigned
-----+----+--------+---------------+-----------------------+-------+--------------------------------+-----------------+--------------------+--------------
| | | GlobalStats | | | | 3 | 3 | 116224
1 | 0 | | Result | | | ARRAY_AGG(OBJECT_CONSTRUCT(... | | |
1 | 1 | 0 | Aggregate | | | aggExprs: [ARRAY_AGG(OBJECT... | | |
1 | 2 | 1 | Filter | | | T3.ALBUMID = 1 | | |
1 | 3 | 2 | TableScan | CHINOOK.PUBLIC.TRACK | T3 | TRACKID, NAME, ALBUMID, MED... | 1 | 1 | 101888
2 | 0 | | Result | | | ARRAY_AGG(OBJECT_CONSTRUCT(... | | |
2 | 1 | 0 | Aggregate | | | aggExprs: [ARRAY_AGG(OBJECT... | | |
2 | 2 | 1 | LeftOuterJoin | | | joinKey: (T2.ARTISTID = T1.... | | |
2 | 3 | 2 | Filter | | | ARRAY_AGG(OBJECT_CONSTRUCT(... | | |
2 | 4 | 3 | Aggregate | | | aggExprs: [ARRAY_AGG(OBJECT... | | |
2 | 5 | 4 | Filter | | | T2.ARTISTID = 1 | | |
2 | 6 | 5 | TableScan | CHINOOK.PUBLIC.ALBUM | T2 | ALBUMID, TITLE, ARTISTID | 1 | 1 | 8192
2 | 7 | 2 | Filter | | | T1.ARTISTID = 1 | | |
2 | 8 | 7 | TableScan | CHINOOK.PUBLIC.ARTIST | T1 | ARTISTID, NAME | 1 | 1 | 6144
-- Three levels: artist, album, track
-- Correlated sub-query: level 1 -> level 2 -> level 3
-- Doesn't work: ✗
explain using tabular
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*,
(
select
array_agg(object_construct(*))
from (
select
*
from track t3
where t3.albumid = t2.albumid) y
) tracks
from album t2
where t2.artistid = t1.artistid) x
) albums
from
artist t1
where artistid = 1);
同样的错误。
最简单的答案是“它不起作用”,因为从性能的角度来看,它们很恶心,永远不应该使用。
但是在一些简单的例子中,它确实有时会起作用。但是会爆炸成当前优化器无法解决的情况,因此会出现错误。
在一个层面上,您可以指向“其他某个数据库”并说他们这样做,是的,他们这样做了。那么下一个问题是,那你为什么不使用那个数据库……我会提出的一个原因是性能。
获得高性能结果的最好方法是理解你的数据,并编写 SQL 来解决你的数据集的每一个缺点,而不是更多。因此,根据定义,必须考虑“所有边缘情况”的通用解决方案性能不佳。