在（左）连接的“ON”条件中添加“UNNEST”是否正确？

Question

假设我有一个包含列“field1”的bigquery表“table1”和包含数组列“field2”的表“table2”

我想将 table1 的每一行连接到 table2 的每一行，其中“field1”的值出现在数组“field2”中

我尝试了下面的查询，但在 Bigquery 中似乎花费了太多时间，对于我测试的情况，它实际上从未结束，所以我认为它有问题

有更好的方法来实现这一目标吗？

测试查询：

SELECT *
FROM table1
LEFT JOIN table2 ON table1.field1 IN UNNEST(table2.field2)

结果：

操作在 6.0 小时后超时。考虑减少您的操作执行的工作量，以便它可以在此限制内完成。

Answer 1

连接表的行数可能比预期多得多。 table2 可能比 table1 大得多，因此右连接可能会更有效。首先我们获得连接表的最终行大小。

WITH table1 as (SELECT 'a'||x as field1 from unnest(generate_array(1,1000)) as x),
table2 as (SELECT x, ((SELECT  array_agg(if(y=x or y in (5,6,7,8),'',y||'')||'a'||y) as field2 from unnest(generate_array(0,1000)) as y)) as field2 from unnest(generate_array(1,1000)) as x)

,helper as (SELECT distinct field1 from table1)
,test as (
SELECT #*,
(( SELECT struct(count(x) as counts,array_agg(x) as data) from unnest(field2) as x inner join helper on field1=x )).*
 from table2 
)
#gather statistics
SELECT count(1) as row_counts_table2, sum(counts) as row_counts_join, min(counts) as min, max(counts) as max, from test

#solution with right join 
#SELECT * FROM (SELECT * FROM test, unnest(data) as data2join) right join table1 on field1= data2join

使用 CTE 为示例数据创建两个临时表（table1 和 table2）。对于
```
y in (...)
```
，对
```
field1
```
的多个加工都在数组
```
field2
```
中。
创建一个名为 helper 的 CTE 来存储 table1 中 field1 的不同值。我们只想要每个条目一次。
CTE 测试使用 subSELECT 来执行 field2 和 field1 之间的内连接。我们使用
```
struct
```
来保存计数和所有条目。
使用字段
```
counts
```
可以获得一些信息，例如连接表的最终行大小。这个可行吗？
从 CTE 测试中取消数据数组的嵌套，生成表 2 中所有可能的连接行。与 table1 的右连接保留 table1 中的所有条目并添加未嵌套的条目。

在（左）连接的“ON”条件中添加“UNNEST”是否正确？

问题描述投票：0回答：1

1个回答

最新问题

在（左）连接的“ON”条件中添加“UNNEST”是否正确？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1