Oracle SQL Developer 正则表达式

问题描述 投票:0回答:2

我不是正则表达式方面的专家,正在寻求帮助。提前致谢

我想从描述列中提取格式化的子字符串。下面的示例

来自

my testing on 456897 - Carol M. Smith, Ph.D. 
my testing on 435670 - Ms. Paulina M. Hall
my testing on 980765 - Mr. John Smith
my testing on 14567 - Mrs. Lena C. Callum
my testing on 555777 - Dr. Paul F. Fairlake
234567 - Mr. Ryan M. Palmer, Sr.
123456 - Joyce R. Hilton, Ph.D.

my testing on 456897 - C.Smith 
my testing on 435670 - Ms. P. Hall
my testing on 980765 - Mr. J. Smith
my testing on 14567 - Mrs. L. Callum
my testing on 555777 - Dr. P. Fairlake
234567 - Mr. R. Palmer
123456 - J. Hilton

我的查询适用于第一条和最后一条记录。但是,有标题的有点复杂。

对于有标题的记录,我需要保留名字和姓氏的首字母。

SELECT description,
       CASE
           WHEN REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)') THEN REGEXP_REPLACE(description, '(Ms\.|Mr\.|Mrs\.|Dr\.[A-Z][a-z]+ [A-Z]\.)')
           WHEN NOT REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)')  THEN REGEXP_REPLACE(description, '(\w)\w*\W+(\w)\w*\W+(\w+),.*', '\1. \3')
           ELSE 'some other validation needed'
       END AS order_regex
       from mytable;

再次感谢您的任何建议。 K

oracle oracle11g oracle10g
2个回答
0
投票

对于这个确切的例子,你可以使用这样的东西:

select
  description,
       CASE
           WHEN REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)') THEN 
              REGEXP_REPLACE(description,  '(Ms\.|Mr\.|Mrs\.|Dr\.) ([A-Z])[a-zA-Z. ]+ ([A-Za-z]+)', '\1 \2. \3')
           WHEN REGEXP_LIKE(description, ', (Ph\.D\.|Sr\.)')  THEN 
              REGEXP_REPLACE(description, '([A-Z])[a-z]+ ([A-Z]\.)? ([A-Z][a-z]+), (Ph\.D\.|Sr\.)', '\1. \3')
           ELSE 'some other validation needed'
       END AS order_regex
from t1

编辑:对多部分名称更通用:

select
  description,
       CASE
           WHEN REGEXP_LIKE(description, '(Mr\.|Ms\.|Mrs\.|Dr\.)') THEN 
              REGEXP_REPLACE(description,  '(Ms\.|Mr\.|Mrs\.|Dr\.) ([A-Z])[a-zA-Z. ]+ ([A-Za-z]+)', '\1 \2. \3')
           WHEN REGEXP_LIKE(description, ', (Ph\.D\.|Sr\.)')  THEN 
              REGEXP_REPLACE(description, '([A-Z])[a-zA-Z. ]* ([A-Z][a-z]+), (Ph\.D\.|Sr\.)', '\1. \2')
           ELSE 'some other validation needed'
       END AS order_regex
from t1

演示在这里v2

但一般来说,名称很难解析,恐怕简单的正则表达式集是行不通的。


0
投票

我会这样做:

select
  t1.*
 ,regexp_replace(
     t1.description
    ,'([^-]+)-\s*((Mr|Ms|Mrs|Dr)[.]\s*)?(\w)\w*(\s[a-zA-Z.]*)*\s(\w+)(,.*|$)'
    ,'\1- \2\4. \6'
    ) subs
from t1

这个正则表达式的简短描述:

  1. ([^-]+)-
    - 查找以
    -
    结尾的子字符串的第一部分(子表达式 #1)
  2. \s*
    - 任意数量的空格字符
  3. ((Mr|Ms|Mrs|Dr)[.]\s*)?
    - 检查先生|女士|女士|博士。存在并作为子表达式 #2
  4. 返回
  5. (\w)\w*
    - 找到一个名字并返回第一个字母作为子表达式 $3
  6. (\s[a-zA-Z.]*)*
    - 名字和姓氏之间的任意数量的单词(子表达式 #4)
  7. \s(\w+)(,.*|$)
    - 查找姓氏(即“,”之前的最后一个单词或字符串的末尾)并作为子表达式 #5 返回。

完整测试用例:

with t1 as (
select 'my testing on 456897 - Carol M. Smith, Ph.D. ' description from dual union all
select 'my testing on 435670 - Ms. Paulina M. Hall' from dual union all
select 'my testing on 980765 - Mr. John Smith' from dual union all
select 'my testing on 14567 - Mrs. Lena C. Callum' from dual union all
select 'my testing on 555777 - Dr. Paul F. Fairlake' from dual union all
select '234567 - Mr. Ryan M. Palmer, Sr.' from dual union all
select '123456 - Joyce R. Hilton, Ph.D.' from dual  
)
select
  t1.*
 ,regexp_replace(
     t1.description
    ,'([^-]+)-\s*((Mr|Ms|Mrs|Dr)[.]\s*)?(\w)\w*(\s[a-zA-Z.]*)*\s(\w+)(,.*|$)'
    ,'\1- \2\4. \6'
    ) subs
from t1;

DBFiddle:https://dbfiddle.uk/HNHHzGR4

© www.soinside.com 2019 - 2024. All rights reserved.