我们遇到内部联接无法按预期方式执行的问题。我们的数据集从根本上将一组日期(从当前周到过去的几周)连接到一组节,具体取决于这些节是在该周开始或之前,然后在该周或之后结束。尽管此查询最初为我们提供了预期的结果,但本周开始为我们提供了错误的结果。经过一堆修补后,我们发现如果将查询更改为LEFT JOIN
,然后使用WHERE
子句过滤查询,它将再次为我们提供正确的结果。
有什么区别?为什么一个起作用而另一个不起作用? (Bonus points:为什么原始查询在突然出现此错误之前要工作几周?)在Redshift上执行相同的内部联接会产生正确的结果,因此这似乎是我们不了解的Snowflake细微差别。 >
原始查询:
一行,日期为2019-12-30(4周前)。过去三周没有数据。WITH week_list AS ( SELECT DATEADD(week, -4, DATE_TRUNC(week, CURRENT_DATE())) AS week_value UNION ALL SELECT DATEADD(week, 1, week_value) FROM week_list WHERE DATEADD(week, 1, week_value) < CURRENT_DATE() ), active_sections_per_week AS ( SELECT wl.week_value , s.id section_id FROM week_list wl JOIN schema.sections s ON wl.week_value >= DATE_TRUNC(week, s.starts_at) AND wl.week_value <= DATE_TRUNC(week, s.ends_at) ) SELECT aspw.week_value , COUNT(DISTINCT aspw.section_id) count_sections FROM active_sections_per_week aspw GROUP BY 1 ORDER BY 1 DESC
结果:
注意:如果您在第一个CTE中调整了DATEADD
,则无论返回的第一个日期是什么,似乎总是可以成功加入。此行为仅在最后一周内开始-以前,此查询提供了预期的行数(换句话说,该行在第一个DATEADD
中指定的周数)。
“固定”查询:
返回四行,日期为2019-12-30至2020-01-20,并带有适当的节计数。WITH week_list AS ( SELECT DATEADD(week, -4, DATE_TRUNC(week, CURRENT_DATE())) AS week_value UNION ALL SELECT DATEADD(week, 1, week_value) FROM week_list WHERE DATEADD(week, 1, week_value) < CURRENT_DATE() ), active_sections_per_week AS ( SELECT wl.week_value , s.id section_id FROM week_list wl LEFT JOIN schema.sections s ON wl.week_value >= DATE_TRUNC(week, s.starts_at) AND wl.week_value <= DATE_TRUNC(week, s.ends_at) WHERE s.id IS NOT NULL ) SELECT aspw.week_value , COUNT(DISTINCT aspw.section_id) count_sections FROM active_sections_per_week aspw GROUP BY 1 ORDER BY 1 DESC
[结果:
我们遇到内部联接无法按预期方式执行的问题。我们的数据集从根本上将一组日期(从当前周到过去的几周)连接到一组部分...
这是“ week_list”上的递归CTE。 Redshift does not support recursive CTEs。
WITH week_list AS (
SELECT DATEADD(week, column1, DATE_TRUNC(week, CURRENT_DATE()))
FROM VALUES (-4),(-3),(-2),(-1),(0)
)