我有下表(标记):
firstname lastname Mark
------------------------------
arun prasanth 40
ann antony 45
sruthy abc 41
new abc 47
arun prasanth 45
arun prasanth 49
ann antony 49
并且如果具有特定列的记录多次出现,则希望添加一个标记列。这是结果:
firstname lastname Mark MULTI_FLAG
----------------------------------------------
arun prasanth 40 1
ann antony 45 1
sruthy abc 41 0
new abc 47 0
arun prasanth 45 1
arun prasanth 49 1
ann antony 49 1
我可以使用以下GROUP BY查询获得结果:
SELECT M1.firstname
,M1.lastname
,M1.Mark
,M2.MULTI_COUNT
FROM Marks M1
JOIN (SELECT firstname, lastname, CASE WHEN COUNT (*) > 1 THEN 1 ELSE 0 END AS MULTI_COUNT
FROM Marks
GROUP BY firstname, lastname) M2
ON M2.firstname = M1.firstname AND M2.lastname = M1.lastname;
或者通过这个更漂亮的PARTITION BY查询:
SELECT
firstname,
lastname,
CASE WHEN COUNT(*) OVER (PARTITION BY
firstname,
lastname) > 1 THEN 1 ELSE 0 END AS MULTI_FLAG
FROM
Marks
在类似的大表上运行GROUP BY查询返回:34 m 56 s 595 ms
在返回的类似大表上运行PARTITION BY查询:
我有兴趣知道:
编辑:Oracle版本Oracle数据库11g企业版版本11.2.0.3.0 - 64位生产PL / SQL版本11.2.0.3.0 - 生产“CORE 11.2.0.3.0生产”TNS for Linux:版本11.2.0.3.0 - 生产NLSRTL版本11.2.0.3.0 - 生产
按计划划分
PLAN_TABLE_OUTPUT
Plan hash value: 3822227444
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 668K| 90M| | 90429 (1)| 00:18:06 |
| 1 | WINDOW SORT | | 668K| 90M| 98M| 90429 (1)| 00:18:06 |
|* 2 | HASH JOIN RIGHT OUTER | | 668K| 90M| | 69340 (1)| 00:13:53 |
| 3 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 4779 | | 3 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | | | | | |
| 5 | NESTED LOOPS | | 377K| 41M| | 69335 (1)| 00:13:53 |
| 6 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
|* 7 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID| Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 2016 | | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(UPPER("CRM"."COUNTRY"(+))=UPPER("QCAB"."TRIAL_COUNTRY"))
7 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM"
AND "PMIM"."ITERATION"="QCAB"."ITERATION")
filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE
'INPUT%')
GROUP BY计划
PLAN_TABLE_OUTPUT
Plan hash value: 648231064
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 912 | 2052K| | 226K (1)| 00:45:22 |
|* 1 | HASH JOIN | | 912 | 2052K| | 226K (1)| 00:45:22 |
| 2 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 4779 | | 3 (0)| 00:00:01 |
|* 3 | HASH JOIN | | 89667 | 194M| 45M| 226K (1)| 00:45:22 |
| 4 | NESTED LOOPS | | | | | | |
| 5 | NESTED LOOPS | | 377K| 41M| | 69335 (1)| 00:13:53 |
| 6 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
|* 7 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID | Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 2016 | | 4 (0)| 00:00:01 |
| 9 | VIEW | | 668K| 1377M| | 86518 (1)| 00:17:19 |
| 10 | HASH GROUP BY | | 668K| 72M| 80M| 86518 (1)| 00:17:19 |
|* 11 | HASH JOIN RIGHT OUTER | | 668K| 72M| | 69340 (1)| 00:13:53 |
| 12 | TABLE ACCESS FULL | COUNTRY_REGION_MAPPINGS | 177 | 2478 | | 3 (0)| 00:00:01 |
| 13 | NESTED LOOPS | | | | | | |
| 14 | NESTED LOOPS | | 377K| 35M| | 69335 (1)| 00:13:53 |
| 15 | MAT_VIEW ACCESS FULL | PROJINFO_MAX_ITER_MVW | 17713 | 328K| | 782 (1)| 00:00:10 |
|* 16 | INDEX RANGE SCAN | Q_CLIN_ASSUM_BYCOUN_PK | 1 | | | 3 (0)| 00:00:01 |
| 17 | TABLE ACCESS BY INDEX ROWID| Q_CLINICAL_ASSUM_BYCOUNTRY | 21 | 1701 | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("R2"."TRIAL_COUNTRY_CD"="CRM"."COUNTRY_CD" AND
UPPER("CRM"."COUNTRY")=UPPER("QCAB"."TRIAL_COUNTRY"))
3 - access("R2"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "R2"."ITERATION"="QCAB"."ITERATION" AND
"R2"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND "R2"."ASSUMPTION"="QCAB"."ASSUMPTION")
7 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND
"PMIM"."ITERATION"="QCAB"."ITERATION")
filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE 'INPUT%')
11 - access(UPPER("CRM"."COUNTRY"(+))=UPPER("QCAB"."TRIAL_COUNTRY"))
16 - access("PMIM"."OPPORTUNITYNUM"="QCAB"."OPPORTUNITYNUM" AND "PMIM"."CONTRACTNUM"="QCAB"."CONTRACTNUM" AND
"PMIM"."ITERATION"="QCAB"."ITERATION")
filter(UPPER("QCAB"."SHEET_LOC") LIKE '%COUNTRY ASSUMPTIONS%' OR UPPER("QCAB"."SHEET_LOC") LIKE 'INPUT%')
通常,您从分析函数count(*)
开始,这导致了一个紧凑的SQL。
这种方法的缺点是必须对数据进行排序(参见WINDOW SORT
操作)。 GROUP BY
方法避免了排序,因为可以使用HASH GROUP BY
,这可以带来更好的性能。
您的示例更复杂,因为您不使用表,而是使用连接三个表的视图 - 对于GROUP BY
和详细数据,此连接执行两次;这当然不是最佳的。
所以我将从查询的分析函数版本开始(可能使用PARALLEL
option)。
如果您想尝试GROUP BY
,可以使用lightway版本:
1)仅对重复的密钥进行分组
2)让OUTER JOIN
指定MULTI_FLAG
以下是执行计划的示例 - 使用您的数据进行简单测试
with dups as (
select firstname,lastname from tmp
group by firstname,lastname
having count(*) > 1)
select tmp.FIRSTNAME, tmp.LASTNAME, tmp.MARK,
case when dups.firstname is not NULL then 1 else 0 end as MULTI_FLAG
from tmp
left outer join dups on tmp.firstname = dups.firstname and tmp.lastname = dups.lastname;
您仍然需要访问您的视图两次,但最终的连接速度会更快(特别是如果您只有少量的重复键)。
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 105K| 26M| | 1673 (1)| 00:00:21 |
|* 1 | HASH JOIN RIGHT OUTER| | 105K| 26M| 11M| 1673 (1)| 00:00:21 |
| 2 | VIEW | | 105K| 10M| | 128 (4)| 00:00:02 |
|* 3 | FILTER | | | | | | |
| 4 | HASH GROUP BY | | 105K| 10M| | 128 (4)| 00:00:02 |
| 5 | TABLE ACCESS FULL| TMP | 105K| 10M| | 125 (1)| 00:00:02 |
| 6 | TABLE ACCESS FULL | TMP | 105K| 15M| | 125 (1)| 00:00:02 |
--------------------------------------------------------------------------------------