我试图理解为什么尽管LEFT JOIN
,以下查询仍只返回一行
with t1(day_partition, entity_id, feature_1) AS (values
('2020-05-15', 'id_1', 'x'),
('2020-05-15', 'id_2', 'y')
),
t2(day_partition, entity_id, feature_2) AS (values
('2020-05-15', 'id_1', 1)
)
SELECT
t1.day_partition AS day_partition_1,
t2.day_partition AS day_partition_2,
t1.entity_id AS entity_id_1,
t2.entity_id AS entity_id_2
FROM
t1
LEFT JOIN
t2
ON
t1.entity_id = t2.entity_id
WHERE
t2.day_partition = '2020-05-15'
;
返回
day_partition_1 | day_partition_2 | entity_id_1 | entity_id_2
-----------------+-----------------+-------------+-------------
2020-05-15 | 2020-05-15 | o1 | o1
但是,删除过滤器
WHERE
t2.day_partition = '2020-05-15'
将返回
day_partition_1 | day_partition_2 | entity_id_1 | entity_id_2
-----------------+-----------------+-------------+-------------
2020-05-15 | 2020-05-15 | id_1 | id_1
2020-05-15 | NULL | id_2 | NULL
我发现这种行为是不直观的,其背后的规则是什么?
如果将条件放入join
,它将按预期工作
SELECT
t1.day_partition AS day_partition_1,
t2.day_partition AS day_partition_2,
t1.entity_id AS entity_id_1,
t2.entity_id AS entity_id_2
FROM
t1
LEFT JOIN
t2
ON
t1.entity_id = t2.entity_id AND t2.day_partition = '2020-05-15'
查询解析器不知道您在想什么。如果在where
子句中过滤数据,它将影响所有记录,而不仅是联接表的记录。
这是设计使然。 where
子句中的条件是强制性的,因此在left join
ed表上放置一个条件最终将逐出left join
返回为空的行。基本上,这会将left join
转换为inner join
。
您需要将与来自left join
ed表的列相关的所有谓词放在连接的on
子句中:
FROM t1
LEFT JOIN t2
ON t1.entity_id = t2.entity_id
AND t2.day_partition = '2020-05-15'
通过查看结果集,我倾向于认为您实际上想要t1
上的条件:
FROM t1
LEFT JOIN t2
ON t1.entity_id = t2.entity_id
WHERE t1.day_partition = '2020-05-15'