想象我有这个数据集:
serial_id | name | address_id | id_duplicates | dob
_______________________________________________________
1 | JOHN | QWERTY | NULL | 10/2001
2 | JOHN | QWERTY | NULL | 10/2001
3 | JOHN | AZERTY | NULL | 10/2001
4 | JOHN | QWERTY | NULL | 09/2001
5 | MARY | QWERTY | NULL | 10/2001
6 | MARY | AZERTY | NULL | 10/2001
7 | MARY | AZERTY | NULL | 10/2001
当记录在某些条件下匹配时,我想用id_duplicates
的any填充serial_id
。
如果我希望具有相同匹配项name
,address_id
和dob
的记录共享serial_id
列中的单个ID,那么我将例如具有:
serial_id | name | address_id | id_duplicates | dob
_______________________________________________________
1 | JOHN | QWERTY | 1 | 10/2001 --> match
2 | JOHN | QWERTY | 1 | 10/2001 --> match
3 | JOHN | AZERTY | 3 | 10/2001 --> no match on address_id
4 | JOHN | QWERTY | 4 | 09/2001 --> no match on dob
5 | MARY | QWERTY | 5 | 10/2001 --> no match on name
6 | MARY | AZERTY | 6 | 10/2001 --> match
7 | MARY | AZERTY | 6 | 10/2001 --> match
我一直很努力地尝试使用嵌套查询来做到这一点,因为这些嵌套查询毫无意义,所以我对此予以张贴……
任何帮助将不胜感激!
您可以使用dense_rank()
:
select t.*,
dense_rank() over (order by name, address, dob) as id_duplicate
from t;
如果要在update
中使用,请使用以下一种方法:
update t
set id_duplicate = tt.new_id_duplicate
from (select t.*,
dense_rank() over (order by name, address, dob) as new_id_duplicate
from t
) tt
where tt.serial_id = t.serial_id;