我有两个表
c0
和 m0
,我使用左连接合并它们。结果是一个包含 13 行的表。我想添加接下来 40 天窗口的第一个日期。数据如下:
DROP TABLE IF EXISTS test0.c0;
CREATE TABLE test0.c0 (
id INTEGER
, cons_dt DATE
);
INSERT INTO test0.c0 VALUES
('1','2000-01-01')
,('1','2000-02-01')
,('1','2000-03-01')
,('1','2000-04-01')
,('1','2000-05-01')
,('1','2000-06-01')
,('1','2000-07-01')
,('1','2000-08-01')
,('1','2000-09-01')
,('1','2000-10-01')
,('1','2000-11-01')
,('1','2000-12-01')
;
DROP TABLE IF EXISTS test0.m0;
CREATE TABLE test0.m0 (
id INTEGER
, start_dt DATE
, atc CHAR(1)
);
INSERT INTO test0.m0 VALUES
('1','2000-03-01','A')
,('1','2000-04-01','A')
,('1','2000-08-01','A')
,('1','2000-08-01','B')
,('1','2000-09-01','A')
,('1','2000-10-01','B')
;
这是代码:
SELECT c.*, m.start_dt, m.atc
, (SELECT MIN(c.cons_dt)
FROM test0.c0 c2
WHERE c2.id = c.id
AND c2.cons_dt BETWEEN c.cons_dt + INTERVAL 1 DAY AND c.cons_dt + INTERVAL 40 DAY
) AS stop_dt
FROM test0.c0 c
LEFT OUTER JOIN test0.m0 m ON c.id = m.id AND c.cons_dt = m.start_dt
这是输出:
+----+------------+------------+-----+---------+
| id | cons_dt | start_dt | atc | stop_dt |
+----+------------+------------+-----+---------+
| 1 | 2000-03-01 | 2000-03-01 | A | \N |
+----+------------+------------+-----+---------+
如果条件未满足,我预计会得到 13 行有停止日期或没有停止日期的行。代码有什么问题吗? 我知道我可以使用 Windows 函数,但这不适用于此类数据。考虑这是一个简化的数据集。原版有很多id和各种atc。
更新1: 这里给出了给出正确结果的代码:
SELECT c.*, m.start_dt, m.atc, MIN(c2.cons_dt) AS stop_dt
FROM test0.c0 c
LEFT OUTER JOIN test0.m0 m ON c.id = m.id AND c.cons_dt = m.start_dt
LEFT JOIN test0.c0 c2 ON c.id = c2.id
AND c2.cons_dt BETWEEN c.cons_dt + INTERVAL 1 DAY AND c.cons_dt + INTERVAL 30 DAY
GROUP BY c.id, c.cons_dt, m.start_dt, m.atc
;
这是我得到的和我期望的表格:
+----+------------+------------+-----+------------+
| id | cons_dt | start_dt | atc | stop_dt |
+----+------------+------------+-----+------------+
| 1 | 2000-01-01 | \N | \N | \N |
| 1 | 2000-02-01 | \N | \N | 2000-03-01 |
| 1 | 2000-03-01 | 2000-03-01 | A | \N |
| 1 | 2000-04-01 | 2000-04-01 | A | 2000-05-01 |
| 1 | 2000-05-01 | \N | \N | \N |
| 1 | 2000-06-01 | \N | \N | 2000-07-01 |
| 1 | 2000-07-01 | \N | \N | \N |
| 1 | 2000-08-01 | 2000-08-01 | A | \N |
| 1 | 2000-08-01 | 2000-08-01 | B | \N |
| 1 | 2000-09-01 | 2000-09-01 | A | 2000-10-01 |
| 1 | 2000-10-01 | 2000-10-01 | B | \N |
| 1 | 2000-11-01 | \N | \N | 2000-12-01 |
| 1 | 2000-12-01 | \N | \N | \N |
+----+------------+------------+-----+------------+
我只是想了解为什么子查询不起作用。一个有趣的问题是哪个表现更好。
更新2:ONLY_FULL_GROUP_BY slaakso 指出了问题所在。当 ONLY_FULL_GROUP_BY 启用时(这是默认值),引擎限制聚合函数的使用。然而,这很令人困惑,因为这有效:
SELECT c.* ,m.atc ,m.start_dt
, (SELECT COUNT(*)
FROM test0.c0 c2
LEFT OUTER JOIN test0.m0 m2 ON c2.id = m2.id AND c2.cons_dt = m2.start_dt
WHERE c2.id = c.id
AND m2.atc <=> m.atc
AND c.cons_dt > c2.cons_dt
) + 1 AS counter
FROM test0.c0 C
LEFT OUTER JOIN test0.m0 m ON c.id = m.id AND c.cons_dt = m.start_dt
ORDER BY m.atc, c.cons_dt, counter
;
这是输出:
+----+------------+-----+------------+---------+
| id | cons_dt | atc | start_dt | counter |
+----+------------+-----+------------+---------+
| 1 | 2000-01-01 | \N | \N | 1 |
| 1 | 2000-02-01 | \N | \N | 2 |
| 1 | 2000-05-01 | \N | \N | 3 |
| 1 | 2000-06-01 | \N | \N | 4 |
| 1 | 2000-07-01 | \N | \N | 5 |
| 1 | 2000-11-01 | \N | \N | 6 |
| 1 | 2000-12-01 | \N | \N | 7 |
| 1 | 2000-03-01 | A | 2000-03-01 | 1 |
| 1 | 2000-04-01 | A | 2000-04-01 | 2 |
| 1 | 2000-08-01 | A | 2000-08-01 | 3 |
| 1 | 2000-09-01 | A | 2000-09-01 | 4 |
| 1 | 2000-08-01 | B | 2000-08-01 | 1 |
| 1 | 2000-10-01 | B | 2000-10-01 | 2 |
+----+------------+-----+------------+---------+
为什么第一个代码不起作用而第二个代码不起作用?这两个代码有何不同?
将普通列与聚合函数混合使用时,需要包含
GROUP BY
子句。
确保您的服务器设置了
ONLY_FULL_GROUP_BY
模式,因为它将捕获许多不正确的查询。