如何避免在使用U-SQL读取具有未知列数的Excel文件时重复最后一列的值

问题描述 投票:0回答:1

我正在尝试使用oh22is ExcelExtractor库读取Excel文件并在Azure Datalake中写入一个csv文件。 Excel文件的表格格式有问题,并且列数未知(按月增加)。

我发现与此自定义提取器一起使用的唯一关键字是EXTRACT。我的方法是从[A]开始提取尽可能多的Excel列([A],[B] ... [AA],[AB] ..)。我正在获取数据,但问题是最后一列的值正在重复。

U-SQL:

USE DATABASE master;

REFERENCE ASSEMBLY [DocumentFormat.OpenXml];
REFERENCE ASSEMBLY [oh22is.Analytics.Formats];

DECLARE @ExcelFile = @SourceFolderPath+@SourceFileName;

@Resources = 
    EXTRACT [A] string, [B] string, [C] string, [D] string, [E] string, [F] string, [G] string, [H] string, [I] string, [J] string, [K] string, [L] string, [M] string, [N] string, [O] string, [P] string, [Q] string, [R] string, [S] string, [T] string, [U] string, [V] string, [W] string, [X] string, [Y] string, [Z] string, [AA] string,  [AB] string,  [AC] string,  [AD] string,  [AE] string,  [AF] string,  [AG] string,  [AH] string,  [AI] string,  [AJ] string,  [AK] string,  [AL] string,  [AM] string,  [AN] string,  [AO] string,  [AP] string,  [AQ] string,  [AR] string,  [AS] string,  [AT] string,  [AU] string,  [AV] string,  [AW] string,  [AX] string,  [AY] string,  [AZ] string,  [BA] string,  [BB] string,  [BC] string,  [BD] string,  [BE] string,  [BF] string,  [BG] string,  [BH] string,  [BI] string,  [BJ] string,  [BK] string,  [BL] string,  [BM] string,  [BN] string,  [BO] string,  [BP] string,  [BQ] string,  [BR] string,  [BS] string,  [BT] string,  [BU] string,  [BV] string,  [BW] string,  [BX] string,  [BY] string,  [BZ] string,  [CA] string,  [CB] string,  [CC] string,  [CD] string,  [CE] string,  [CF] string,  [CG] string,  [CH] string,  [CI] string,  [CJ] string,  [CK] string,  [CL] string,  [CM] string,  [CN] string,  [CO] string,  [CP] string,  [CQ] string,  [CR] string,  [CS] string,  [CT] string,  [CU] string,  [CV] string,  [CW] string,  [CX] string,  [CY] string,  [CZ] string,  [DA] string,  [DB] string,  [DC] string,  [DD] string,  [DE] string,  [DF] string,  [DG] string,  [DH] string,  [DI] string,  [DJ] string,  [DK] string,  [DL] string,  [DM] string,  [DN] string,  [DO] string,  [DP] string,  [DQ] string,  [DR] string,  [DS] string,  [DT] string,  [DU] string,  [DV] string,  [DW] string,  [DX] string,  [DY] string,  [DZ] string,  [EA] string,  [EB] string,  [EC] string,  [ED] string,  [EE] string,  [EF] string,  [EG] string,  [EH] string,  [EI] string,  [EJ] string,  [EK] string,  [EL] string,  [EM] string,  [EN] string,  [EO] string,  [EP] string,  [EQ] string,  [ER] string,  [ES] string,  [ET] string,  [EU] string,  [EV] string,  [EW] string,  [EX] string,  [EY] string,  [EZ] string,  [FA] string,  [FB] string,  [FC] string,  [FD] string,  [FE] string,  [FF] string,  [FG] string,  [FH] string,  [FI] string,  [FJ] string,  [FK] string,  [FL] string,  [FM] string,  [FN] string,  [FO] string,  [FP] string,  [FQ] string,  [FR] string,  [FS] string,  [FT] string,  [FU] string,  [FV] string,  [FW] string,  [FX] string,  [FY] string,  [FZ] string,  [GA] string,  [GB] string,  [GC] string,  [GD] string,  [GE] string,  [GF] string,  [GG] string,  [GH] string,  [GI] string,  [GJ] string,  [GK] string,  [GL] string,  [GM] string,  [GN] string,  [GO] string,  [GP] string,  [GQ] string,  [GR] string,  [GS] string,  [GT] string,  [GU] string,  [GV] string,  [GW] string,  [GX] string,  [GY] string,  [GZ] string,  [HA] string,  [HB] string,  [HC] string,  [HD] string,  [HE] string,  [HF] string,  [HG] string,  [HH] string,  [HI] string,  [HJ] string,  [HK] string,  [HL] string,  [HM] string,  [HN] string,  [HO] string,  [HP] string,  [HQ] string,  [HR] string,  [HS] string,  [HT] string,  [HU] string,  [HV] string,  [HW] string,  [HX] string,  [HY] string,  [HZ] string
    FROM @ExcelFile
    USING new oh22is.Analytics.Formats.ExcelExtractor("Ark1");

OUTPUT @Resources
TO "/unpivotBasic1.txt"
USING Outputters.Csv();

输出:

Column1   Column2   Column3   Column4   Column5   Column6   Column7   Column8   Column9   Column10   Column11   Column12   Column13   Column14   Column15   Column16   Column17   Column18   Column19   Column20   Column21   Column22   Column23   Column24   Column25   Column26   Column27   Column28   Column29   Column30   Column31   Column32   Column33   Column34   Column35   Column36   Column37   Column38   Column39   Column40   Column41   Column42   Column43   ...   Column226
SUM:         36,8      40,2      45,6     45,85     55,05      59,1      51,4      49,1       49,3          0       39,8       39,6       44,5       45,2         45       41,5       44,3       46,8       46,7       46,5       46,5          0         41       41,9       41,3       41,1       27,5       17,6         18       12,3       11,3        8,8          8          0        7,8        7,8        7,4        7,4        7,4        7,4        7,4        7,4   ...         7,4
ÅR           2019                          2020                                                                                                                            2021                                                                                                                                2022                                                                                                                                                             ...  
Mnd       Oktober   November  Desember   Januar    Februar     Mars      April      Mai       Juni       Juli     August  September    Oktober    November   Desember    Januar     Februar    Mars        April       Mai        Juni       Juli     August  September    Oktober    November   Desember    Januar    Februar       Mars       April       Mai       Juni       Juli     August  September    Oktober   November   November   November   November   November   ...    November

[输出正确,但第[AN]至[HZ]列(第40至234列)除外,该列重复[AM]列或第39列的值,该列是原始Excel中数据的最后一列。如何摆脱这些重复的值,或者我做错了什么?最终目标是将这些数据分解为“年”,“月”和“总和”列。

azure azure-data-lake u-sql
1个回答
0
投票

[通过查看oh22is.Analytics.Formats中的代码,发现原因很明显。循环结构不会删除/更新最后一个单元格的值。解决了我的办公桌抽屉中的问题:AzureDataLake-GitHub

© www.soinside.com 2019 - 2024. All rights reserved.