用于解析复杂格式文本的SQL Server正则表达式

问题描述 投票:0回答:2

我在SQL Server 2016中具有类似于以下表格:

CREATE TABLE #SampleValues (TextID INT, Comment VARCHAR(MAX) )
INSERT INTO #SampleValues (TextID, Comment)
SELECT 1, 'user 1 has done crosswalk 99220 to 99215 and submitted' UNION
SELECT 2, 'got update that crossedwalked 99308 to 99221' UNION
SELECT 3, '99255 CROSSWALKED TO 99223' UNION
SELECT 4, 'proposed crosswalk 99219 to 99214 and clean' UNION
SELECT 5, 'tested and confiimed cross walked code from 99223 to 99255' UNION
SELECT 6, 'User 2 Crosswalked codes change 99254 to 99222' UNION
SELECT 7, 'User3cross walked code from 99232 to 99307'
SELECT 8, 'Updated to 99307'

预期结果将类似于下面的屏幕截图。enter image description here

Comment值应采用以下格式之一(不区分大小写)。如果不遵循这些格式,则预期结果将为NULL

<some_pre-text>crossedwalked Number1 to Number2<some_post-text>
<some_pre-text>crosswalk Number1 to Number2<some_post-text>
<some_pre-text>crosswalk Number1 to Number2<some_post-text>
<some_pre-text>Crosswalked codes change Number1 to Number2<some_post-text>
<some_pre-text>cross walked code from Number1 to Number2<some_post-text>
<some_pre-text>cross walked code from Number1 to Number2<some_post-text>
<some_pre-text>Number1 CROSSWALKED TO Number2<some_post-text>

我找到了一些简单的正则表达式示例,但没有找到如何实现这些复杂格式的示例。关于如何执行此复杂的正则表达式有任何想法吗?

sql sql-server regex
2个回答
1
投票

SQL Server在字符串处理方面非常糟糕。如果像您的示例一样,我假设to总是在第二个数字之前,那么:

select *,
       left(v.pat, charindex(' to ', v.pat) + 9)
from SampleValues t cross apply
     (values (case when comment like '%[0-9][0-9][0-9][0-9][0-9]%to%[0-9][0-9][0-9][0-9][0-9]%'
                   then stuff(comment,
                              1,
                              patindex('%[0-9][0-9][0-9][0-9][0-9]%to%[0-9][0-9][0-9][0-9][0-9]%', comment) - 1,
                              ''
                             )
              end)
     ) v(pat)

Here是db <>小提琴。


1
投票

尝试以下操作:

select comment,  NULLIF(SUBSTRING(comment
                            , PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', comment)
                                , CASE WHEN CHARINDEX(' ', comment, patindex('%[0-9][0-9][0-9][0-9][0-9]%', Comment)) - PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', comment) < 0 THEN 0
                                    ELSE CHARINDEX(' ', comment, patindex('%[0-9][0-9][0-9][0-9][0-9]%', Comment)) - PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', comment) END ), '')
+ ' to ' + NULLIF(REVERSE(substring(REVERSE(comment)
                        , patindex('%[0-9][0-9][0-9][0-9][0-9]%', REVERSE(comment))
                            , CASE WHEN CHARINDEX(' ', REVERSE(comment), patindex('%[0-9][0-9][0-9][0-9][0-9]%', REVERSE(Comment))) - patindex('%[0-9][0-9][0-9][0-9][0-9]%', REVERSE(comment)) < 0     THEN 0 ELSE CHARINDEX(' ', REVERSE(comment), patindex('%[0-9][0-9][0-9][0-9][0-9]%', REVERSE(Comment))) - patindex('%[0-9][0-9][0-9][0-9][0-9]%', REVERSE(comment))  END )),'') AS Extracted

涉及的步骤:

  1. 提取具有5个或更多数字的任何数字的起始位置

  2. 从该位置查找下一个空格(从步骤1开始)

  3. 在反向字符串中重复以上两个步骤

  4. 根据需要连接

请找到db <>提琴手here

© www.soinside.com 2019 - 2024. All rights reserved.