仅对超过特定值的观察结果进行排名

问题描述 投票:0回答:1

我有下表:

DROP TABLE IF EXISTS #df

CREATE TABLE #df 
(
    CommID VARCHAR(10),
    ProvID VARCHAR(5),
    VisitCount int,
    [% Score] INT,
    TimePeriod VARCHAR(10),
    Median_VisitCount INT,
    Average_VisitCount INT
);

INSERT INTO #df (CommID, ProvID, VisitCount, [% Score], TimePeriod, Median_VisitCount, Average_VisitCount)
VALUES
('AB345', '001', '65', .45, 'ThisYear', 48.5, 42),
('AB345', '002', '67', .64, 'ThisYear', 48.5, 42),
('AB345', '003', '32', .78, 'ThisYear', 48.5, 42),
('AB345', '004', '4', .32, 'ThisYear', 48.5, 42),
('AB345', '001', '23', .45, 'LastYear', 42.5, 41),
('AB345', '002', '56', .64, 'LastYear', 48.5, 41),
('AB345', '003', '31', .78, 'LastYear', 48.5, 41),
('AB345', '004', '54', .32, 'LastYear', 48.5, 41)

SELECT * FROM #df

希望我的最终输出是这样的:

DROP TABLE IF EXISTS #final

CREATE TABLE #final 
(
    CommID VARCHAR(10),
    ProvID VARCHAR(5),
    VisitCount int,
    [% Score] INT,
    TimePeriod VARCHAR(10),
    Median_VisitCount INT,
    Average_VisitCount INT,
    Highest INT,
    Lowest INT
);

INSERT INTO #final (CommID, ProvID, VisitCount, [% Score], TimePeriod, Median_VisitCount, Average_VisitCount, Highest, Lowest)
VALUES
('AB345', '001', '65', .45, 'ThisYear', 48.5, 42, 3, 1),
('AB345', '002', '67', .64, 'ThisYear', 48.5, 42, 2, 2),
('AB345', '003', '32', .78, 'ThisYear', 48.5, 42, 1, 3),
('AB345', '004', '4', .32, 'ThisYear', 48.5, 42, NULL, NULL),
('AB345', '001', '23', .45, 'LastYear', 42.5, 41, NULL, NULL),
('AB345', '002', '56', .64, 'LastYear', 48.5, 41, 1, 2),
('AB345', '003', '31', .78, 'LastYear', 48.5, 41, NULL, NULL),
('AB345', '004', '54', .32, 'LastYear', 48.5, 41, 2, 1)

SELECT * FROM #final

对于给定的 CommID 和 TimePeriod,我想对 [% Score] 进行排名,但仅限于 [VisitCounts] >= Average_VisitCount。

这是我编写的代码,但排名仍在考虑低于 Average_VisitCount 的值。我希望访问次数小于 AverageVisit 次数的任何行都不会被考虑在排名中:

SELECT a.CommID
     , a.ProvID
     , a.VisitCount
     , a.[% Score]
     , CASE WHEN VisitCount >= a.Average_VisitCount 
            THEN RANK() OVER (PARTITION BY a.CommID, TimePeriod ORDER BY [a].[% Score] DESC) 
            ELSE NULL END AS Highest
     , CASE WHEN VisitCount >= a.Average_VisitCount 
            THEN RANK() OVER (PARTITION BY a.CommID, TimePeriod ORDER BY [a].[% Score]) 
            ELSE NULL END AS Lowest
     , a.TimePeriod
     , a.Median_VisitCount
     , a.Average_VisitCount 
FROM #df a 
ORDER BY a.CommID, a.TimePeriod, a.VisitCount DESC
sql-server rank
1个回答
0
投票

您的演示数据将

[% Score]
指定为
INT
,所以我将其更改为
DECIMAL(5,2)
,否则每行都会得分
0

您已经完成了大部分工作。基本上,将那些超出您的范围的内容排在底部,这样它们就不会干扰您感兴趣的排名,或者将它们保留在排名的底部,或者使它们显示 NULL:

SELECT *, CASE WHEN VisitCount >= Average_VisitCount THEN DENSE_RANK() OVER (PARTITION BY CommID, TimePeriod ORDER BY CASE WHEN VisitCount >= Average_VisitCount THEN [% Score] ELSE 999 END) END
  FROM #df

内部

CASE
表达式使用值 999(某个任意超出范围的值)对它们进行排名,外部
CASE
表达式导致列对这些值返回 NULL。

通讯ID ProvID 访问次数 % 分数 时间段 Median_VisitCount 平均_访问次数 排名
AB345 004 54 0.32 去年 48 41 1
AB345 002 56 0.64 去年 48 41 2
AB345 003 31 0.78 去年 48 41
AB345 001 23 0.45 去年 42 41
AB345 001 65 0.45 今年 48 42 1
AB345 002 67 0.64 今年 48 42 2
AB345 003 32 0.78 今年 48 42
AB345 004 4 0.32 今年 48 42
© www.soinside.com 2019 - 2024. All rights reserved.