我们将进行大规模的数据库清理操作,总共将删除约 1 亿行。有 40 个表可供删除数据。这是我的想法,我愿意接受建议
#1 方法
批量收集然后删除,同时记录删除的行。每 100 行提交一次
示例->
--define record and needed variables
commit_counter NUMBER := 0;
COMMIT_LIMIT CONSTANT NUMBER := 100;
v_total_deleted_services NUMBER := 0;
TYPE t_record_entity_test IS RECORD (
ENTITY_ID NUMBER,
SOURCE VARCHAR2(100),
SOURCE_ID VARCHAR2(100),
MESSAGE_ID VARCHAR2(100),
STATUS VARCHAR2(200)
);
TYPE t_record_entity_tests IS TABLE OF t_record_entity_test INDEX BY PLS_INTEGER;
v_records_test t_record_entity_tests;
//Make cursor
CURSOR c_services IS
SELECT --all the data needed--
OPEN c_services;
LOOP
FETCH c_services BULK COLLECT INTO v_records_test LIMIT 10000;
EXIT WHEN v_records_test.COUNT = 0;
FORALL i IN 1..v_records_test.COUNT
INSERT INTO DELETE_LOG_TEST(SOURCE, SOURCE_ID, status, log_date)
VALUES (v_records_test(i).SOURCE, v_records_test(i).SOURCE_ID, 'Service DELETED,' || ' Status: ' ||v_records_test(i).status , SYSDATE);
FORALL i IN 1..v_records_test.COUNT
DELETE FROM SERVICE WHERE ENTITY_ID = v_records_test(i).ENTITY_ID;
v_total_deleted_services := v_total_deleted_services + SQL%ROWCOUNT;
commit_counter := commit_counter + v_records_test.COUNT;
IF commit_counter >= COMMIT_LIMIT THEN
COMMIT;
commit_counter := 0;
END IF;
end loop;
close c_services;
commit;
--log number of deleted rows
#2 方法
批量收集并记录正在删除的行。一次性全部删除,最后commit。不知道是否可以,因为其中一项操作可能会删除 1000 万行
--define record and needed variables
v_total_deleted_services NUMBER := 0;
TYPE t_record_entity_test IS RECORD (
ENTITY_ID NUMBER,
SOURCE VARCHAR2(100),
SOURCE_ID VARCHAR2(100),
MESSAGE_ID VARCHAR2(100),
STATUS VARCHAR2(200)
);
TYPE t_record_entity_tests IS TABLE OF t_record_entity_test INDEX BY PLS_INTEGER;
v_records_test t_record_entity_tests;
//Make cursor
CURSOR c_services IS
SELECT --all the data needed--
OPEN c_services;
LOOP
FETCH c_services BULK COLLECT INTO v_records_test LIMIT 10000;
EXIT WHEN v_records_test.COUNT = 0;
FORALL i IN 1..v_records_test.COUNT
INSERT INTO DELETE_LOG_TEST(SOURCE, SOURCE_ID, status, log_date)
VALUES (v_records_test(i).SOURCE, v_records_test(i).SOURCE_ID, 'Service DELETED,' || ' Status: ' ||v_records_test(i).status , SYSDATE);
end loop;
close c_services;
DELETE FROM SERVICE WHERE ENTITY_ID = --select entity_id of data needed to be deleted that is the same data that's in the cursor;
v_total_deleted_services := v_total_deleted_services + SQL%ROWCOUNT;
commit;
--log number of deleted rows
什么是更好的方法?还有第三种方法比这两种方法更好吗?
您没有处理任何异常。您确定对所有行进行
delete
操作一切都会顺利吗?例如,外键约束怎么样?
无论如何:如果你逐行执行,它会越来越慢,并且删除 1000 万行肯定需要时间(正如你所说)。如果您切换到设置处理并使用
forall
功能而不是 table
,速度会更快。像这样的东西:
日志表:
SQL> create table delete_log (empno number, log_date date);
Table created.
样本表;应删除
deptno <> 30
的行:
SQL> select * From test order by deptno, ename;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7782 CLARK MANAGER 7839 09.06.1981 00:00:00 2450 10
7839 KING PRESIDENT 17.11.1981 00:00:00 5000 10
7934 MILLER CLERK 7782 23.01.1982 00:00:00 1300 10
7876 ADAMS CLERK 7788 12.01.1983 00:00:00 1100 20
7902 FORD ANALYST 7566 03.12.1981 00:00:00 3000 20
7566 JONES MANAGER 7839 02.04.1981 00:00:00 2975 20
7788 SCOTT ANALYST 7566 09.12.1982 00:00:00 3000 20
7369 SMITH CLERK 7902 17.12.1980 00:00:00 800 20
7499 ALLEN SALESMAN 7698 20.02.1981 00:00:00 1600 300 30
7698 BLAKE MANAGER 7839 01.05.1981 00:00:00 2850 30
7900 JAMES CLERK 7698 03.12.1981 00:00:00 950 30
7654 MARTIN SALESMAN 7698 28.09.1981 00:00:00 1250 1400 30
7844 TURNER SALESMAN 7698 08.09.1981 00:00:00 1500 0 30
7521 WARD SALESMAN 7698 22.02.1981 00:00:00 1250 500 30
14 rows selected.
程序:
SQL> declare
2 l_tab sys.odcinumberlist;
3 l_tot number := 0;
4 cursor c1 is select empno from test where deptno <> 30;
5 begin
6 open c1;
7 loop
8 fetch c1 bulk collect into l_tab limit 3;
9 exit when l_tab.count = 0;
10
11 insert into delete_log (empno, log_date)
12 select column_value, sysdate
13 from table(l_tab);
14
15 delete from test t
16 where exists (select null from table(l_tab)
17 where column_value = t.empno);
18
19 l_tot := l_tot + sql%rowcount;
20 end loop;
21 dbms_output.put_line('Deleted ' || l_tot || ' rows');
22 end;
23 /
Deleted 8 rows
PL/SQL procedure successfully completed.
结果:
SQL> select * From test order by deptno, ename;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
7499 ALLEN SALESMAN 7698 20.02.1981 00:00:00 1600 300 30
7698 BLAKE MANAGER 7839 01.05.1981 00:00:00 2850 30
7900 JAMES CLERK 7698 03.12.1981 00:00:00 950 30
7654 MARTIN SALESMAN 7698 28.09.1981 00:00:00 1250 1400 30
7844 TURNER SALESMAN 7698 08.09.1981 00:00:00 1500 0 30
7521 WARD SALESMAN 7698 22.02.1981 00:00:00 1250 500 30
6 rows selected.
日志:
SQL> select * From delete_log;
EMPNO LOG_DATE
---------- -------------------
7782 16.11.2023 11:40:45
7788 16.11.2023 11:40:45
7839 16.11.2023 11:40:45
7876 16.11.2023 11:40:45
7902 16.11.2023 11:40:45
7934 16.11.2023 11:40:45
7369 16.11.2023 11:40:45
7566 16.11.2023 11:40:45
8 rows selected.
SQL>
在提交时(我没有在这里实现;你知道怎么做):100(在我看来)太低了;将其设置为例如10000(等于您在
fetch
中使用的限制)。
提醒您:考虑异常处理(如有必要)。
如果您可以在应用程序停机时执行此操作,并且要删除大型表的很大一部分,那么使用您想要保留的行创建一个新段比删除不需要的行要高效得多t。最有效的是 CTAS 并替换:
CREATE TABLE abc$new PARALLEL (DEGREE 16) AS SELECT * FROM abc WHERE [rows-I-want-to-keep];
ALTER TABLE abc RENAME TO abc$old;
ALTER TABLE abc$new RENAME TO abc;
CREATE TABLE abc$old NOLOGGING PARALLEL (DEGREE 16) AS SELECT * FROM abc;
TRUNCATE TABLE abc;
ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ APPEND PARALLEL(abc,16) */ INTO abc SELECT * FROM abc$old WHERE [rows-I-want-to-keep];
COMMIT;
这里的缺点是,桌子有一段时间是空的,所以你的应用程序最好关闭。这两种技术都会生成一个不再包含您希望删除的行的表,以及另一个包含原始内容的表,以备您需要恢复时使用。然后,您可以计划稍后在确定不需要数据后删除
abc$old
表,以便释放空间。
当然,如果您必须在应用程序使用这些表时在线执行这些维护操作,那么该要求将迫使您重新使用您正在考虑的某种逐步批量删除过程。这会慢得多,但侵入性较小。