我有一个时间序列类型的事务表。为了获取最新的更新记录,我使用下面提到的查询
SELECT DISTINCT
statuses.id,
statuses.transaction_id,
statuses.created_on,
statuses.status
FROM
`gringotts_dwh.transaction_status` statuses
INNER JOIN (
SELECT DISTINCT
transaction_id,
MAX(created_on) AS created_on,
FROM
`gringotts_dwh.transaction_status`
GROUP BY
transaction_id) latest_update
ON
statuses.transaction_id = latest_update.transaction_id
AND statuses.created_on = latest_update.created_on
)
SELECT * FROM transactions
然而,通过这样做,结果中仍然缺少一些记录。他们在下面提到
id | 交易编号 | created_on | 交易状态 |
---|---|---|---|
11488196 | 6232804 | 2023-04-08 11:57:28 UTC | 53 |
11480223 | 6232245 | 2023-04-05 01:33:39 UTC | 43 |
11487410 | 6226866 | 2023-04-07 09:41:41 UTC | 32 |
11492618 | 6227333 | 2023-04-06 22:50:18 UTC | 102 |
11479541 | 6235787 | 2023-04-05 11:09:47 UTC | 空 |
这些记录存在于基表本身中。但是链接后它们不存在。
如果我将查询从 CTE+Subquery 更改为仅子查询,我将在结果中获取这些记录。更新后的查询如下所述
SELECT statuses.transaction_status FROM
(SELECT DISTINCT
statuses.id,
statuses.transaction_id,
statuses.created_on,
statuses.status AS transaction_status
FROM
`gringotts_dwh.transaction_status` statuses
WHERE
id in (11492618,11488196,11487410,11480223,11479541)
ORDER BY
created_on DESC) statuses
INNER JOIN
(SELECT DISTINCT
transaction_id,
MAX(created_on) AS created_on,
FROM
`gringotts_dwh.transaction_status`
WHERE
id in (11492618,11488196,11487410,11480223,11479541)
GROUP BY
transaction_id
ORDER BY
created_on DESC) latest_update
ON
statuses.transaction_id = latest_update.transaction_id
AND statuses.created_on = latest_update.created_on
有人可以向我解释一下这种行为吗?
期待中的感谢