如果满足条件,则每年保留每对出现一次

问题描述 投票:0回答:1

我在 SQL 中有这个表(“颜色”):

CREATE TABLE colors (
    color1 VARCHAR(50),
    color2 VARCHAR(50),
    year INT,
    var1 INT,
    var2 INT,
    var3 INT,
    var4 INT
);


INSERT INTO colors (color1, color2, year, var1, var2, var3, var4) VALUES
    ('red', 'blue', 2010, 1, 2, 1, 2),
    ('blue', 'red', 2010, 1, 2, 1, 2),
    ('red', 'blue', 2011, 1, 2, 5, 3),
    ('blue', 'red', 2011, 5, 3, 1, 2),
    ('orange', NULL, 2010, 5, 9, NULL, NULL)
('green', 'white', 2010, 5, 9, 6, 3);

表格如下所示:

 color1 color2 year var1 var2 var3 var4
    red   blue 2010    1    2    1    2
   blue    red 2010    1    2    1    2
    red   blue 2011    1    2    5    3
   blue    red 2011    5    3    1    2
 orange   NULL 2010    5    9 NULL NULL
green    white 2010    5    9    6    3

我正在尝试执行以下操作:

  • 对于同一年的颜色对(例如红色/蓝色/2010 和蓝色/红色/2010) - 如果 var1=var3 且 var2=var4 :则仅保留一对
  • 对于同一年的颜色对 - 如果 var1!=var3 OR var2!=var4 :则保留两对
  • 对于同一年没有配对的颜色:也保留这些行

最终结果应该是这样的:

 color1 color2 year var1 var2 var3 var4
    red   blue 2010    1    2    1    2
    red   blue 2011    1    2    5    3
   blue    red 2011    5    3    1    2
 orange   NULL 2010    5    9 NULL NULL
green    white 2010    5    9    6    3

这是我尝试为此编写的 SQL 代码:

首先我编写 CTE 来识别对 - 然后验证 OR 条件:

WITH pairs AS (
    SELECT *,
    CASE 
        WHEN color1 < color2 THEN color1 || color2 || CAST(year AS VARCHAR(4))
        ELSE color2 || color1 || CAST(year AS VARCHAR(4))
    END AS pair_id
    FROM colors
),
ranked_pairs AS (
    SELECT *,
    ROW_NUMBER() OVER(PARTITION BY pair_id ORDER BY color1, color2) as row_num
    FROM pairs
)
SELECT color1, color2, year, var1, var2, var3, var4
FROM ranked_pairs
WHERE row_num = 1 OR var1 != var3 OR var2 != var4;

输出如下所示:

 color1 color2 year var1 var2 var3 var4
 orange   <NA> 2010    5    9   NA   NA
   blue    red 2010    1    2    1    2
   blue    red 2011    5    3    1    2
    red   blue 2011    1    2    5    3
  green  white 2010    5    9    6    3

我这样做正确吗?最终结果看起来是正确的,但我不自信,e。此代码可能不适用于某些边缘情况。

谢谢!

sql netezza
1个回答
1
投票

如果同一对代表不同的顺序,则

pair_id
中的排序颜色似乎是错误的。此外,您还将空值视为相等。

请检查以下版本:

WITH pairs AS (
    SELECT
        color1,
        color2,
        year,
        var1,
        var2,
        var3,
        var4,
        CASE 
            WHEN color1 < color2 THEN color1 || color2 || CAST(year AS VARCHAR(4))
            ELSE color2 || color1 || CAST(year AS VARCHAR(4))
        END AS pair_id
    FROM colors
),
ranked_pairs AS (
    SELECT
        color1,
        color2,
        year,
        var1,
        var2,
        var3,
        var4,
        ROW_NUMBER() OVER(PARTITION BY pair_id ORDER BY LEAST(color1, color2), GREATEST(color1, color2)) as row_num
    FROM pairs
)
SELECT color1, color2, year, var1, var2, var3, var4
FROM ranked_pairs
WHERE row_num = 1 OR var1 IS DISTINCT FROM var3 OR var2 IS DISTINCT FROM var4;
© www.soinside.com 2019 - 2024. All rights reserved.