SQL Join:两端的所有值都有累积条件(Presto / AWS Athena)

问题描述 投票:1回答:1

我一直在看这个看似简单的问题,但是没有解决方案,假设我有一个带有日期列表的表,另一个带电话号码,人和日期的表,我需要得出一个最终结果,所有名称和所有日期,并在第三列显示与结果中的日期相同或更大的任何日期中出现的唯一电话号码的数量,这是一个示例:

t1
+------------+
|    date    |
+------------+
| 01/01/2020 |
| 01/02/2020 |
| 01/03/2020 |
| 01/04/2020 |
| 01/05/2020 |
| 01/06/2020 |
| 01/07/2020 |
| 01/08/2020 |
+------------+

t2
+------+------------+--------------+
| name |    date    | phone_number |
+------+------------+--------------+
| John | 01/01/2020 |          123 |
| Mike | 01/02/2020 |          456 |
| Mike | 01/03/2020 |          789 |
| John | 01/04/2020 |          999 |
| Mike | 01/05/2020 |          111 |
| John | 01/06/2020 |          777 |
| Mike | 01/07/2020 |          123 |
| Mike | 01/08/2020 |          456 |
| John | 01/01/2020 |          789 |
| John | 01/02/2020 |          789 |
| Mike | 01/03/2020 |          789 |
| John | 01/04/2020 |          789 |
+------+------------+--------------+

我想要的结果:

+------+------------+-----------------------------------------------------------------+
| Name |   Month    | Comulative Unique Numbers (Unique Numbers in any date >= Month) |
+------+------------+-----------------------------------------------------------------+
| John | 01/01/2020 |                                                               4 |
| John | 01/02/2020 |                                                               3 |
| John | 01/03/2020 |                                                               3 |
| John | 01/04/2020 |                                                               3 |
| John | 01/05/2020 |                                                               1 |
| John | 01/06/2020 |                                                               1 |
| John | 01/07/2020 |                                                               0 |
| John | 01/08/2020 |                                                               0 |
| Mike | 01/01/2020 |                                                               4 |
| Mike | 01/02/2020 |                                                               4 |
| Mike | 01/03/2020 |                                                               4 |
| Mike | 01/04/2020 |                                                               3 |
| Mike | 01/05/2020 |                                                               3 |
| Mike | 01/06/2020 |                                                               2 |
| Mike | 01/07/2020 |                                                               2 |
| Mike | 01/08/2020 |                                                               1 |
+------+------------+-----------------------------------------------------------------+

我尝试了很多方法,这是我认为最接近的方法:

SELECT * FROM t1
LEFT OUTER JOIN
(SELECT t1.date, COUNT(DISTINCT phone_number) count, name FROM t1
LEFT OUTER JOIN
t2
ON t1.date < t2.date
GROUP BY t1.date,t2.name
ORDER BY 2 DESC) temp
ON t1.date = temp.date

我仍然从最终结果中丢失行。

这就是我得到的:

+------+------------+-------+
| name |    date    | count |
+------+------------+-------+
| null | 2020-08-01 |     0 |
| John | 2020-01-01 |     3 |
| John | 2020-02-01 |     3 |
| John | 2020-03-01 |     3 |
| John | 2020-04-01 |     1 |
| John | 2020-05-01 |     1 |
| Mike | 2020-01-01 |     4 |
| Mike | 2020-02-01 |     4 |
| Mike | 2020-03-01 |     3 |
| Mike | 2020-04-01 |     3 |
| Mike | 2020-05-01 |     2 |
| Mike | 2020-06-01 |     2 |
| Mike | 2020-07-01 |     1 |
+------+------------+-------+
sql presto amazon-athena
1个回答
0
投票

使用日历表方法,我们可以建立一个由所有名称和所有日期组成的参考表。然后,将其连接到包含实际数据的第二个表:

© www.soinside.com 2019 - 2024. All rights reserved.