我的 SQL 表中有一个示例数据集,我需要以特定方式对数据进行分组。
+-------+----------+----------------+
| ID | SUM | DATE |
+-------+----------+----------------+
| 8 | 0 | 2023-01-01 |
| 8 | 0 | 2023-01-02 |
| 8 | 10 | 2023-01-03 |
| 8 | 0 | 2023-01-04 |
| 8 | 200 | 2023-01-05 |
| 8 | 200 | 2023-01-06 |
| 8 | 200 | 2023-01-07 |
| 8 | 200 | 2023-01-08 |
| 8 | 200 | 2023-01-09 |
| 778 | 200 | 2023-10-25 |
| 778 | 200 | 2023-10-26 |
+-------+----------+----------------+
我希望将其作为最终结果:按天数升序对相同的金额进行分组,并从中取出最短日期。另外,应该根据不同的ID进行分组。 如下图所示,您可以看到 DATE 为 2023-01-1 和 2023-01-02 的 ID 8 被分组在一起。下一条记录的突变为 SUM = 10,因此它也被视为单独的行。然后,SUM = 10 变异回 SUM = 0,这也应该被视为新行。
+-------+----------+----------------+
| ID | SUM | DATE |
+-------+----------+----------------+
| 8 | 0 | 2023-01-01 |
| 8 | 10 | 2023-01-03 |
| 8 | 0 | 2023-01-04 |
| 8 | 200 | 2023-01-05 |
| 778 | 200 | 2023-10-25 |
+-------+----------+----------------+
我尝试了下面的 SQL 查询,但我意识到 ID = 8、SUM = 0、DATE 2023-01-04 的行与 2023-01-01 和 2023-01-02 的行分组在一起并且失踪了
SELECT
[ID],
[SUM],
MIN([DATE]) AS [Date]
FROM [dbo].[test]
GROUP BY [ID], [SUM]
+-------+----------+----------------+
| ID | SUM | DATE |
+-------+----------+----------------+
| 8 | 0 | 2023-01-01 |
| 8 | 10 | 2023-01-03 |
| 8 | 200 | 2023-01-05 |
| 778 | 200 | 2023-10-25 |
+-------+----------+----------------+
有人可以帮我想出一个聪明的解决方案,在分组时考虑升序的 DATE 列,或者创建一个带有突变的新行吗?
这是一个间隙和孤岛问题,要解决它,请使用
LAG()
函数获取每个 id 和 sum 的前一个日期,然后仅选择 date_diff <> 1 的条目。
WITH cte as (
SELECT *,
COALESCE(DATEDIFF(DAY,
LAG([date]) OVER (PARTITION BY id, sum ORDER BY [date]),
[date]
), 0) AS date_diff
FROM mytable
)
SELECT id, sum, [date]
FROM cte
WHERE date_diff <> 1
ORDER BY [date]
如果您只想要每组的第一行,您可以这样做:
select id, sum, [date]
from (
select t.*, lag([date]) over (partition by id, sum order by [date]) as prev_date
from mytable t
) x
where prev_date is null
order by [date]
结果:
id sum date
---- ---- ----------
8 0 2023-01-01
8 10 2023-01-03
8 200 2023-01-05
778 200 2023-10-25
请参阅 db<>fiddle 处的运行示例。