SQL更改列组

问题描述 投票:-2回答:2

假设我有一个按日期排序的表:

+-------------+--------+
|    DATE     |  VALUE |
+-------------+--------+
|  01-09-2020 |   5    |
|  01-15-2020 |   5    |
|  01-17-2020 |   5    |
|  02-03-2020 |   8    |
|  02-13-2020 |   8    |
|  02-20-2020 |   8    |
|  02-23-2020 |   5    |
|  02-25-2020 |   5    |
|  02-28-2020 |   3    |
|  03-13-2020 |   3    |
|  03-18-2020 |   3    |
+-------------+--------+

我想按给定日期范围内的值变化进行分组,并添加一个每次递增的值作为添加的列来表示这一点。

我尝试了许多不同的操作,例如使用lag函数:

SELECT value, value - lag(value) over (order by date) as count
GROUP BY value

总之,我想拿上面的桌子,看起来像这样:

+-------------+--------+-------+
|    DATE     |  VALUE | COUNT |
+-------------+--------+-------+
|  01-09-2020 |   5    |   1   |
|  01-15-2020 |   5    |   1   |
|  01-17-2020 |   5    |   1   |
|  02-03-2020 |   8    |   2   |
|  02-13-2020 |   8    |   2   |
|  02-20-2020 |   8    |   2   |
|  02-23-2020 |   5    |   3   |
|  02-25-2020 |   5    |   3   |
|  02-28-2020 |   3    |   4   |
|  03-13-2020 |   3    |   4   |
|  03-18-2020 |   3    |   4   |
+-------------+--------+-------+

我最终希望将所有这些内容都放在一张小桌子中,每个都有最早的日期。

+-------------+--------+-------+
|    DATE     |  VALUE | COUNT |
+-------------+--------+-------+
|  01-09-2020 |   5    |   1   |
|  02-03-2020 |   8    |   2   |
|  02-23-2020 |   5    |   3   |
|  02-28-2020 |   3    |   4   |
+-------------+--------+-------+

任何帮助将不胜感激

sql scala apache-spark lag gaps-and-islands
2个回答
0
投票

您可以递归使用lag(),然后使用row_number()分析函数:

WITH t2 AS
(
SELECT LAG(value,1,value-1) OVER (ORDER BY date) as lg,
       t.*
  FROM t
)
SELECT t2.date,t2.value, ROW_NUMBER() OVER (ORDER BY t2.date) as count
  FROM t2
 WHERE value - lg != 0 

Demo

并过滤掉那些函数返回的值之间的不相等。


-1
投票

您可以使用滞后和累加和以及子查询:

SELECT value,
       SUM(CASE WHEN prev_value = value THEN 0 ELSE 1 END) OVER (ORDER BY date)
FROM (SELECT t.*, LAG(value) OVER (ORDER BY date) as prev_value
      FROM t
     ) t

Here是db <>小提琴。

© www.soinside.com 2019 - 2024. All rights reserved.