从 ORACLE 数据库中删除 1 亿行。最好的方法?

问题描述 投票:0回答:2

我们将进行大规模的数据库清理操作,总共将删除约 1 亿行。有 40 个表可供删除数据。这是我的想法,我愿意接受建议

#1 方法

批量收集然后删除,同时记录删除的行。每 100 行提交一次

示例->

--define record and needed variables

commit_counter NUMBER := 0;
COMMIT_LIMIT CONSTANT NUMBER := 100; 
v_total_deleted_services NUMBER := 0;
TYPE t_record_entity_test IS RECORD (
    ENTITY_ID NUMBER,
    SOURCE VARCHAR2(100),
    SOURCE_ID VARCHAR2(100),
    MESSAGE_ID VARCHAR2(100),
    STATUS VARCHAR2(200)
);

 TYPE t_record_entity_tests IS TABLE OF t_record_entity_test INDEX BY PLS_INTEGER;
 v_records_test t_record_entity_tests;
//Make cursor
 CURSOR c_services IS
        SELECT --all the data needed--

  OPEN c_services;
    LOOP
        FETCH c_services BULK COLLECT INTO v_records_test LIMIT 10000; 
        EXIT WHEN v_records_test.COUNT = 0;

   FORALL i IN 1..v_records_test.COUNT
            INSERT INTO DELETE_LOG_TEST(SOURCE, SOURCE_ID, status, log_date)
            VALUES (v_records_test(i).SOURCE, v_records_test(i).SOURCE_ID, 'Service DELETED,' || ' Status: ' ||v_records_test(i).status , SYSDATE);

   FORALL i IN 1..v_records_test.COUNT
            DELETE FROM SERVICE WHERE ENTITY_ID = v_records_test(i).ENTITY_ID;
            v_total_deleted_services := v_total_deleted_services + SQL%ROWCOUNT;

        commit_counter := commit_counter + v_records_test.COUNT;

       IF commit_counter >= COMMIT_LIMIT THEN
            COMMIT;
            commit_counter := 0; 
        END IF;



   end loop;
close c_services;
commit;
--log number of deleted rows

#2 方法

批量收集并记录正在删除的行。一次性全部删除,最后commit。不知道是否可以,因为其中一项操作可能会删除 1000 万行

--define record and needed variables

v_total_deleted_services NUMBER := 0;
TYPE t_record_entity_test IS RECORD (
    ENTITY_ID NUMBER,
    SOURCE VARCHAR2(100),
    SOURCE_ID VARCHAR2(100),
    MESSAGE_ID VARCHAR2(100),
    STATUS VARCHAR2(200)
);

 TYPE t_record_entity_tests IS TABLE OF t_record_entity_test INDEX BY PLS_INTEGER;
 v_records_test t_record_entity_tests;
//Make cursor
 CURSOR c_services IS
        SELECT --all the data needed--

  OPEN c_services;
    LOOP
        FETCH c_services BULK COLLECT INTO v_records_test LIMIT 10000; 
        EXIT WHEN v_records_test.COUNT = 0;

   FORALL i IN 1..v_records_test.COUNT
            INSERT INTO DELETE_LOG_TEST(SOURCE, SOURCE_ID, status, log_date)
            VALUES (v_records_test(i).SOURCE, v_records_test(i).SOURCE_ID, 'Service DELETED,' || ' Status: ' ||v_records_test(i).status , SYSDATE);
    end loop;
close c_services;

            DELETE FROM SERVICE WHERE ENTITY_ID = --select entity_id of data needed to be deleted that is the same data that's in the cursor;
    
            v_total_deleted_services := v_total_deleted_services + SQL%ROWCOUNT;



   
commit;
--log number of deleted rows

什么是更好的方法?还有第三种方法比这两种方法更好吗?

oracle plsql bulk
2个回答
0
投票

您没有处理任何异常。您确定对所有行进行

delete
操作一切都会顺利吗?例如,外键约束怎么样?

无论如何:如果你逐行执行,它会越来越慢,并且删除 1000 万行肯定需要时间(正如你所说)。如果您切换到设置处理并使用

forall
功能而不是
table
,速度会更快。像这样的东西:

日志表:

SQL> create table delete_log (empno number, log_date date);

Table created.

样本表;应删除

deptno <> 30
的行:

SQL> select * From test order by deptno, ename;

     EMPNO ENAME      JOB              MGR HIREDATE                   SAL       COMM     DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
      7782 CLARK      MANAGER         7839 09.06.1981 00:00:00       2450                    10
      7839 KING       PRESIDENT            17.11.1981 00:00:00       5000                    10
      7934 MILLER     CLERK           7782 23.01.1982 00:00:00       1300                    10
      7876 ADAMS      CLERK           7788 12.01.1983 00:00:00       1100                    20
      7902 FORD       ANALYST         7566 03.12.1981 00:00:00       3000                    20
      7566 JONES      MANAGER         7839 02.04.1981 00:00:00       2975                    20
      7788 SCOTT      ANALYST         7566 09.12.1982 00:00:00       3000                    20
      7369 SMITH      CLERK           7902 17.12.1980 00:00:00        800                    20
      7499 ALLEN      SALESMAN        7698 20.02.1981 00:00:00       1600        300         30
      7698 BLAKE      MANAGER         7839 01.05.1981 00:00:00       2850                    30
      7900 JAMES      CLERK           7698 03.12.1981 00:00:00        950                    30
      7654 MARTIN     SALESMAN        7698 28.09.1981 00:00:00       1250       1400         30
      7844 TURNER     SALESMAN        7698 08.09.1981 00:00:00       1500          0         30
      7521 WARD       SALESMAN        7698 22.02.1981 00:00:00       1250        500         30

14 rows selected.

程序:

SQL> declare
  2    l_tab sys.odcinumberlist;
  3    l_tot number := 0;
  4    cursor c1 is select empno from test where deptno <> 30;
  5  begin
  6    open c1;
  7    loop
  8      fetch c1 bulk collect into l_tab limit 3;
  9      exit when l_tab.count = 0;
 10
 11      insert into delete_log (empno, log_date)
 12        select column_value, sysdate
 13        from table(l_tab);
 14
 15      delete from test t
 16        where exists (select null from table(l_tab)
 17                      where column_value = t.empno);
 18
 19      l_tot := l_tot + sql%rowcount;
 20    end loop;
 21    dbms_output.put_line('Deleted ' || l_tot || ' rows');
 22  end;
 23  /
Deleted 8 rows

PL/SQL procedure successfully completed.

结果:

SQL> select * From test order by deptno, ename;

     EMPNO ENAME      JOB              MGR HIREDATE                   SAL       COMM     DEPTNO
---------- ---------- --------- ---------- ------------------- ---------- ---------- ----------
      7499 ALLEN      SALESMAN        7698 20.02.1981 00:00:00       1600        300         30
      7698 BLAKE      MANAGER         7839 01.05.1981 00:00:00       2850                    30
      7900 JAMES      CLERK           7698 03.12.1981 00:00:00        950                    30
      7654 MARTIN     SALESMAN        7698 28.09.1981 00:00:00       1250       1400         30
      7844 TURNER     SALESMAN        7698 08.09.1981 00:00:00       1500          0         30
      7521 WARD       SALESMAN        7698 22.02.1981 00:00:00       1250        500         30

6 rows selected.

日志:

SQL> select * From delete_log;

     EMPNO LOG_DATE
---------- -------------------
      7782 16.11.2023 11:40:45
      7788 16.11.2023 11:40:45
      7839 16.11.2023 11:40:45
      7876 16.11.2023 11:40:45
      7902 16.11.2023 11:40:45
      7934 16.11.2023 11:40:45
      7369 16.11.2023 11:40:45
      7566 16.11.2023 11:40:45

8 rows selected.

SQL>

在提交时(我没有在这里实现;你知道怎么做):100(在我看来)太低了;将其设置为例如10000(等于您在

fetch
中使用的限制)。

提醒您:考虑异常处理(如有必要)。


0
投票

如果您可以在应用程序停机时执行此操作,并且要删除大型表的很大一部分,那么使用您想要保留的行创建一个新段比删除不需要的行要高效得多t。最有效的是 CTAS 并替换: CREATE TABLE abc$new PARALLEL (DEGREE 16) AS SELECT * FROM abc WHERE [rows-I-want-to-keep]; ALTER TABLE abc RENAME TO abc$old; ALTER TABLE abc$new RENAME TO abc;

缺点是您还必须编写脚本并重新应用任何从属对象,例如索引、约束、触发器和授权。但这是完成任务最有效的方法(最少的处理时间)。
一种效率稍低的方法,但在处理从属对象时需要较少的关注,使用相同的概念,但移动数据两次,因此原始对象仍然是永久对象:

CREATE TABLE abc$old NOLOGGING PARALLEL (DEGREE 16) AS SELECT * FROM abc; TRUNCATE TABLE abc; ALTER SESSION ENABLE PARALLEL DML; INSERT /*+ APPEND PARALLEL(abc,16) */ INTO abc SELECT * FROM abc$old WHERE [rows-I-want-to-keep]; COMMIT;

这里的缺点是,桌子有一段时间是空的,所以你的应用程序最好关闭。这两种技术都会生成一个不再包含您希望删除的行的表,以及另一个包含原始内容的表,以备您需要恢复时使用。然后,您可以计划稍后在确定不需要数据后删除
abc$old
表,以便释放空间。

当然,如果您必须在应用程序使用这些表时在线执行这些维护操作,那么该要求将迫使您重新使用您正在考虑的某种逐步批量删除过程。这会慢得多,但侵入性较小。

© www.soinside.com 2019 - 2024. All rights reserved.