我的 Oracle 表中有一个 CLOB 字段,并且想要提取与模式“RES_GetResData_Public_ScreenPrint”及其后续值(用“|”分隔)匹配的所有出现的字符串。
样本数据-
|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|
对于上面提供的示例,我使用以下查询从 CLOB 数据中获取所有出现的字符串“RES_GetResData_Public_ScreenPrint”及其后续值(用 | 分隔)。
查询能够获取所有出现的“RES_GetResData_Public_ScreenPrint”,但仅返回所有行中的第一个后续值。
**查询 - **
select objectid,
REGEXP_SUBSTR(DATA, 'RES_GetResData_Public_ScreenPrint[^\|]+', 1,column_value) column1,
REGEXP_SUBSTR(DATA, '([^\|]+)', 1, 3) AS column2
from test_table cross join table(cast(multiset(select level
from dual connect by level <= regexp_count(DATA, 'RES_GetResData_Public_ScreenPrint') ) as sys.odcinumberlist))
实际结果 -
对象ID | 第1栏 | 第2栏 |
---|---|---|
12345 | RES_GetResData_Public_ScreenPrint010 | F28028079 |
12345 | RES_GetResData_Public_ScreenPrint011 | F28028079 |
12345 | RES_GetResData_Public_ScreenPrint013 | F28028079 |
预期结果 -
对象ID | 第1栏 | 第2栏 |
---|---|---|
12345 | RES_GetResData_Public_ScreenPrint010 | F28028079 |
12345 | RES_GetResData_Public_ScreenPrint011 | F28028081 |
12345 | RES_GetResData_Public_ScreenPrint013 | F28028007 |
我是编写查询的新手,任何建议都会有帮助。
谢谢!!
您不需要为此使用正则表达式;尽管需要更多的输入,但您可能会发现简单的字符串函数比正则表达式更快:
WITH line_bounds (objectid, data, spos, epos) AS (
SELECT objectid,
data,
1,
INSTR(data, CHR(10), 1)
FROM test_table
UNION ALL
SELECT objectid,
data,
epos + 1,
INSTR(data, CHR(10), epos + 1)
FROM line_bounds
WHERE epos > 0
)
SEARCH DEPTH FIRST BY objectid SET orderid,
lines (objectid, line) AS (
SELECT objectid,
CASE epos
WHEN 0
THEN SUBSTR(data, spos)
ELSE SUBSTR(data, spos, epos -spos)
END
FROM line_bounds
),
match_bounds (objectid, line, res_spos, res_epos, next_epos) AS (
SELECT objectid,
line,
INSTR(line, '|RES_GetResData_Public_ScreenPrint'),
INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 1),
INSTR(line, '|', INSTR(line, '|RES_GetResData_Public_ScreenPrint') + 1, 2)
FROM lines
)
SELECT objectid,
SUBSTR(line, res_spos + 1, res_epos - res_spos - 1) AS column1,
SUBSTR(line, res_epos + 1, next_epos - res_epos - 1) AS column2
FROM match_bounds
WHERE res_spos > 0;
对于样本数据:
CREATE TABLE test_table (objectid, data) AS
SELECT 12345,
EMPTY_CLOB() || '|-1080|0833|RES_GetResData_Public_ScreenPrint010|F28028079|0820|3.3 02/17/14080|080|080|031|
|00879|[0-0]?[3.3 02/17/14-3.3 02/17/14]?[0833]|RES_GetResData_Public_ScreenPrint011|F28028081|080|080|080|080|032|-1080|032|-1032|-1080|032|-1080|032|
|-1080|0833|RES_GetResData_Public_ScreenPrint013|F28028007|0820|'
FROM DUAL;
输出:
对象ID | 第一栏 | 第2栏 |
---|---|---|
12345 | RES_GetResData_Public_ScreenPrint010 | F28028079 |
12345 | RES_GetResData_Public_ScreenPrint011 | F28028081 |
12345 | RES_GetResData_Public_ScreenPrint013 | F28028007 |
如果您确实想使用正则表达式(请比较两种解决方案对您的数据的性能),那么您可以匹配整个模式并提取捕获组的值:
select objectid,
REGEXP_SUBSTR(
DATA,
'\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
1,
column_value,
NULL,
1
) AS column1,
REGEXP_SUBSTR(
DATA,
'\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|',
1,
column_value,
NULL,
2
) AS column2
from test_table
cross join table(
cast(
multiset(
select level
from dual
connect by level <= regexp_count(DATA, '\|(RES_GetResData_Public_ScreenPrint.*?)\|(.*?)\|')
) as sys.odcinumberlist
)
)
具有相同的输出。