TSQL Flatten Split Tab Delimited Column

问题描述 投票:0回答:0

我正在将错误日志平面文件导入我的 SQL 服务器,并且需要将制表符分隔的列解析为多个列。大量借鉴这个问题(SQL Split Tab Delimited Column),特别是@Lobo 的回答,我想完成几件事:

  • 捕获使用 STRING_SPLIT() 创建的最大列数 [第 0 列],以便我可以创建动态 PIVOT,以备不时之需 有比我目前知道的更多的专栏
  • 将已解析的 [Column 0] 列与其余列适当组合
    记录

第一个目标(动态列数)我暂时可以没有,但这是我遇到问题的第二个目标。

DECLARE @SAMPLE_TABLE table(
    [Column 0] nvarchar(4000),
    [Filename] nvarchar(260),
    FileExtention varchar(255),
    DateTimeStamp datetime,
    CustomerNumber varchar(255),
    FileType varchar(255),
    ImportSetNumber varchar(255)
)

表格的一些示例数据:

[Column 0]                                                                                                                                | [Filename]                                                       | FileExtention | DateTimeStamp           | CustomerNumber | FileType | ImportSetNumber
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1<tab>Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker.<tab><tab>| E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001
1<tab>Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.<tab>0<tab>218                    | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001
1<tab>Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.<tab>0<tab>17                     | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186

让游戏开始

;WITH
CTE_Columns AS(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) 'MyRowID',
            [Filename],
            FileExtention,
            DateTimeStamp,
            CustomerNumber,
            FileType,
            ImportSetNumber,
            A.ColID 'ColumnNumber',
            A.Cols 'ColumnValue'
    FROM @SAMPLE_TABLE
    CROSS APPLY (
        SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColID,
                value [Cols]
                FROM STRING_SPLIT([Column 0], CHAR(9))  -- split by tab character
    )A
)

SELECT MyRowID,
        [Filename],
        FileExtention,
        DateTimeStamp,
        CustomerNumber,
        FileType,
        ImportSetNumber,
        NULLIF(TRIM([1]), '') 'FirstColumn',
        NULLIF(TRIM([2]), '') 'SecondColumn',
        NULLIF(TRIM([3]), '') 'ThirdColumn',
        NULLIF(TRIM([4]), '') 'FourthColumn'
FROM (
        SELECT MyRowID,
                [Filename],
                FileExtention,
                DateTimeStamp,
                CustomerNumber,
                FileType,
                ImportSetNumber,
                ColumnNumber,
                ColumnValue
        FROM CTE_Columns
)Q
PIVOT(MAX(Q.ColumnValue) FOR ColumnNumber IN([1], [2], [3], [4])) PIV
ORDER BY CustomerNumber,
            ImportSetNumber

此查询产生以下结果集:

MyRowID | Filename                                                         | FileExtention | DateTimeStamp           | CustomerNumber | FileType | ImportSetNumber | FirstColumn | SecondColumn                                                                                                               | ThirdColumn | FourthColumn
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | NULL                                                                                                                       | NULL        | NULL
2       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker. | NULL        | NULL
3       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | NULL        | NULL
4       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | NULL        | NULL
5       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | NULL                                                                                                                       | NULL        | NULL
6       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.                         | NULL        | NULL
7       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | 0           | NULL
8       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | NULL        | 218
9       | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | 1           | NULL                                                                                                                       | NULL        | NULL
10      | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | NULL        | Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.                         | NULL        | NULL
11      | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | NULL        | NULL                                                                                                                       | 0           | NULL
12      | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | NULL        | NULL                                                                                                                       | NULL        | 17

根据上面的结果集,第 1-4 行应该是一个记录,第 5-8 行应该是第二个记录,第 9-12 行应该是第三个记录,给了我想要的最终状态:

Filename                                                         | FileExtention | DateTimeStamp           | CustomerNumber | FileType | ImportSetNumber | FirstColumn | SecondColumn                                                                                                               | ThirdColumn | FourthColumn
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker. | NULL        | NULL
E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.                         | 0           | 218
E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | 1           | Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.                         | 0           | 17

我相信这只是正确分组的简单问题,但我不确定要分组的内容,或正确放置分组的位置(无论是分区还是某个地方的标准 GROUP BY 子句)

csv tsql tabs delimiter sql-server-2019
© www.soinside.com 2019 - 2024. All rights reserved.