Oracle 查询列出 CLOB 字段中出现的所有字符串及其后续值

问题描述 投票:0回答:1

我的 Oracle 表中有一个 CLOB 字段,并且想要提取与模式“RES_GetResData_Public_ScreenPrint”及其后续值(用“|”分隔)匹配的所有出现的字符串。

样本数据-

|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|


对于上面提供的示例,我使用以下查询从 CLOB 数据中获取所有出现的字符串“RES_GetResData_Public_ScreenPrint”及其后续值(用 | 分隔)。

查询能够获取所有出现的“RES_GetResData_Public_ScreenPrint”,但仅返回所有行中的第一个后续值。

**查询 - **

select objectid, 
    REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^\|]+', 1,column_value) column1, 
    REGEXP_SUBSTR(DATA, '([^\|]+)', 1, 3) AS column2
from test_table cross join table(cast(multiset(select level 
from dual connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint') ) as sys.odcinumberlist))

实际结果 -

对象ID 第1栏 第2栏
12345 RES_GetResData_Public_ScreenPrint010 F28028079
12345 RES_GetResData_Public_ScreenPrint011 F28028079
12345 RES_GetResData_Public_ScreenPrint013 F28028079

预期结果 -

对象ID 第1栏 第2栏
12345 RES_GetResData_Public_ScreenPrint010 F28028079
12345 RES_GetResData_Public_ScreenPrint011 F28028081
12345 RES_GetResData_Public_ScreenPrint013 F28028007

我是编写查询的新手,任何建议都会有帮助。

谢谢!!

regex oracle loops clob
1个回答
0
投票

您不需要为此使用正则表达式;尽管需要更多的输入,但您可能会发现简单的字符串函数比正则表达式更快:

WITH line_bounds (objectid, data, spos, epos) AS (
  SELECT objectid,
         data,
         1,
         INSTR(data, CHR(10), 1)
  FROM   test_table
UNION ALL
  SELECT objectid,
         data,
         epos + 1,
         INSTR(data, CHR(10), epos + 1)
  FROM   line_bounds
  WHERE  epos > 0
)
SEARCH DEPTH FIRST BY objectid SET orderid,
lines (objectid, line) AS (
  SELECT objectid,
         CASE epos
         WHEN 0
         THEN SUBSTR(data, spos)
         ELSE SUBSTR(data, spos, epos -spos)
         END
  FROM   line_bounds
),
match_bounds (objectid, line, res_spos, res_epos, next_epos) AS (
  SELECT objectid,
         line,
         INSTR(line, '|RES_GetResData_Public_ScreenPrint'),
         INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 1),
         INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 2)
  FROM   lines
)
SELECT objectid,
       SUBSTR(line, res_spos + 1, res_epos - res_spos - 1) AS column1,
       SUBSTR(line, res_epos + 1, next_epos - res_epos - 1) AS column2
FROM   match_bounds
WHERE  res_spos > 0;

对于样本数据:

CREATE TABLE test_table (objectid, data) AS
SELECT 12345,
       EMPTY_CLOB() || '|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|
|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|
|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|'
FROM   DUAL;

输出:

对象ID 第一栏 第2栏
12345 RES_GetResData_Public_ScreenPrint010 F28028079
12345 RES_GetResData_Public_ScreenPrint011 F28028081
12345 RES_GetResData_Public_ScreenPrint013 F28028007

如果您确实想使用正则表达式(请比较两种解决方案对您的数据的性能),那么您可以匹配整个模式并提取捕获组的值:

select objectid, 
       REGEXP_SUBSTR(
         DATA,
         '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
         1,
         column_value,
         NULL,
         1
       ) AS column1, 
       REGEXP_SUBSTR(
         DATA,
         '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
         1,
         column_value,
         NULL,
         2
       ) AS column2
from   test_table
       cross join table(
         cast(
           multiset(
             select level 
             from   dual
             connect by level <= regexp_count(DATA, '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|')
           ) as sys.odcinumberlist
         )
       )

具有相同的输出。

小提琴

© www.soinside.com 2019 - 2024. All rights reserved.