假设我有一个按日期排序的表:
+-------------+--------+
| DATE | VALUE |
+-------------+--------+
| 01-09-2020 | 5 |
| 01-15-2020 | 5 |
| 01-17-2020 | 5 |
| 02-03-2020 | 8 |
| 02-13-2020 | 8 |
| 02-20-2020 | 8 |
| 02-23-2020 | 5 |
| 02-25-2020 | 5 |
| 02-28-2020 | 3 |
| 03-13-2020 | 3 |
| 03-18-2020 | 3 |
+-------------+--------+
我想按给定日期范围内的值变化进行分组,并添加一个每次递增的值作为添加的列来表示这一点。
我尝试了许多不同的操作,例如使用lag
函数:
SELECT value, value - lag(value) over (order by date) as count
GROUP BY value
总之,我想拿上面的桌子,看起来像这样:
+-------------+--------+-------+
| DATE | VALUE | COUNT |
+-------------+--------+-------+
| 01-09-2020 | 5 | 1 |
| 01-15-2020 | 5 | 1 |
| 01-17-2020 | 5 | 1 |
| 02-03-2020 | 8 | 2 |
| 02-13-2020 | 8 | 2 |
| 02-20-2020 | 8 | 2 |
| 02-23-2020 | 5 | 3 |
| 02-25-2020 | 5 | 3 |
| 02-28-2020 | 3 | 4 |
| 03-13-2020 | 3 | 4 |
| 03-18-2020 | 3 | 4 |
+-------------+--------+-------+
我最终希望将所有这些内容都放在一张小桌子中,每个都有最早的日期。
+-------------+--------+-------+
| DATE | VALUE | COUNT |
+-------------+--------+-------+
| 01-09-2020 | 5 | 1 |
| 02-03-2020 | 8 | 2 |
| 02-23-2020 | 5 | 3 |
| 02-28-2020 | 3 | 4 |
+-------------+--------+-------+
任何帮助将不胜感激
您可以递归使用lag()
,然后使用row_number()
分析函数:
WITH t2 AS
(
SELECT LAG(value,1,value-1) OVER (ORDER BY date) as lg,
t.*
FROM t
)
SELECT t2.date,t2.value, ROW_NUMBER() OVER (ORDER BY t2.date) as count
FROM t2
WHERE value - lg != 0
并过滤掉那些函数返回的值之间的不相等。
您可以使用滞后和累加和以及子查询:
SELECT value,
SUM(CASE WHEN prev_value = value THEN 0 ELSE 1 END) OVER (ORDER BY date)
FROM (SELECT t.*, LAG(value) OVER (ORDER BY date) as prev_value
FROM t
) t
Here是db <>小提琴。