我有一个表,其中包含按县划分的Covid数据。我需要遍历表格来计算与前一天相同的县名病例和死亡人数差异。例如。我知道在钱伯斯(Chambers)在3/20的15:00总案件是4,而在3/19的总案件是1。差是3。我需要在表的每一行的临时表中插入COUNTYNAME,DateReported和区分大小写。当然,这是虚构的数据。
CID COUNTYNAME Cases Deaths DateReported
---------------------------------------------------------------------
1 | Baldwin | 1 | 0 | 2020-03-19 12:00:00.000
2 | Cook | 1 | 0 | 2020-03-19 12:00:00.000 |
3 | Chambers | 1 | 0 | 2020-03-19 12:00:00.000 |
4 | Total | 3 | 0 | 2020-03-19 12:00:00.000 |
5 | Baldwin | 1 | 0 | 2020-03-19 15:00:00.000 |
6 | Cook | 2 | 0 | 2020-03-19 15:00:00.000 |
7 | Chambers | 4 | 0 | 2020-03-19 15:00:00.000 |
8 | Elmore | 1 | 0 | 2020-03-19 15:00:00.000 |
9 | Total | 8 | 0 | 2020-03-19 15:00:00.000 |
10 | Baldwin | 1 | 0 | 2020-03-20 12:00:00.000 |
11 | Cook | 2 | 0 | 2020-03-20 12:00:00.000 |
12 | Chambers | 4 | 0 | 2020-03-20 12:00:00.000 |
13 | Clarke | 1 | 0 | 2020-03-20 12:00:00.000 |
14 | Elmore | 1 | 0 | 2020-03-20 12:00:00.000 |
15 | Total | 9 | 0 | 2020-03-20 12:00:00.000 |
16 | Baldwin | 1 | 0 | 2020-03-20 15:00:00.000 |
17 | Cook | 2 | 0 | 2020-03-20 15:00:00.000 |
18 | Chambers | 4 | 0 | 2020-03-20 15:00:00.000 |
19 | Clarke | 1 | 0 | 2020-03-20 15:00:00.000 |
20 | Elmore | 2 | 0 | 2020-03-20 15:00:00.000 |
21 | Total | 10 | 0 | 2020-03-20 15:00:00.000 |
这里是我所拥有的,似乎使我接近需要的东西,但我的桌子上有50,000行,这需要5分钟以上的时间来执行。
CREATE TABLE #tempTable1 (
CountyName varchar(50),
DateReported datetime,
DiffVal int
)
DECLARE @RowCount INT
,@HourVal int = datepart(hh,getdate())
,@PreviousDayVal date = dateadd(DD, -1, cast(getdate() as date))
--Get the number of rows in our table to loop through.
SET @RowCount = (SELECT COUNT(COUNTYNAME) FROM myCovidTable)
DECLARE @I INT
SET @I = 1
WHILE (@I <= @RowCount)
BEGIN
DECLARE @iCountyName VARCHAR(50)
,@iDateReported datetime
,@iDiffVal int
,@CountyNameVal varchar(50) = (SELECT COUNTYNAME FROM myCovidTable WHERE CID = @I)
,@CurrentDateVal datetime = (SELECT dateadd(DD, 0, cast(DateReported as date)) FROM myCovidTable WHERE CID = @I); -- The current row's DateReported value
--The date reported isn't always constant so I need to parse the date
WITH tempTable2
AS (SELECT Cases,
Cast(DateReported AS DATE) AS DateField
FROM myCovidTable
WHERE Datepart(HH, ( DateReported )) = @HourVal
AND COUNTYNAME = @CountyNameVal)
SELECT @iDiffVal = (
SELECT SUM (Cases)
FROM tempTable2
WHERE DateField = @CurrentDateVal) -
(SELECT SUM (Cases)
FROM tempTable2
WHERE DateField = @PreviousDayVal)
-- Then we insert it into the table
SET @iDateReported = (SELECT DateReported FROM myCovidTable WHERE CID = @I)
SET @iCountyName = (SELECT COUNTYNAME FROM myCovidTable WHERE CID = @I)
SET @I = @I + 1
INSERT into #tempTable1 select @iCountyName as CountyName, @iDateReported as DateReported, @iDiffVal as DiffValue
END
SELECT * FROM #tempTable1
结果应该是一个包含三列的表:CountyName,DateReported和DiffVal,显示各行(报告的日期)按县划分的前一天同一时间的病例差异。
一个选项是自连接:
insert into #tempTable1 (countyname, datereported, diffval)
select
t.countyname,
t.datereported,
t.case - coalesce(t1.case, 0)
from mytable t
left join mytable t1
on t1.countyname = t.countyname
and t1.datereported = dateadd(day, -1, t.datereported)
您可以使用窗口功能:
CREATE TABLE #stat
(
CID INT NOT NULL PRIMARY KEY,
CountyName VARCHAR(50) NOT NULL,
DateReported DATETIME NOT NULL,
Cases INT NOT NULL,
Deaths INT NOT NULL
);
-- INSERT INTO #stat ...
CREATE NONCLUSTERED INDEX IX_Stat ON #stat (CountyName, DateReported) INCLUDE (Deaths, Cases);
SELECT CountyName,
DateReported,
CasesDiff = Cases - LAG(Cases) OVER (PARTITION BY CountyName ORDER BY DateReported),
DeathsDiff = Deaths - LAG(Deaths) OVER (PARTITION BY CountyName ORDER BY DateReported)
FROM #stat;