我正在尝试使用oh22is ExcelExtractor库读取Excel文件并在Azure Datalake中写入一个csv文件。 Excel文件的表格格式有问题,并且列数未知(按月增加)。
我发现与此自定义提取器一起使用的唯一关键字是EXTRACT。我的方法是从[A]开始提取尽可能多的Excel列([A],[B] ... [AA],[AB] ..)。我正在获取数据,但问题是最后一列的值正在重复。
U-SQL:
USE DATABASE master;
REFERENCE ASSEMBLY [DocumentFormat.OpenXml];
REFERENCE ASSEMBLY [oh22is.Analytics.Formats];
DECLARE @ExcelFile = @SourceFolderPath+@SourceFileName;
@Resources =
EXTRACT [A] string, [B] string, [C] string, [D] string, [E] string, [F] string, [G] string, [H] string, [I] string, [J] string, [K] string, [L] string, [M] string, [N] string, [O] string, [P] string, [Q] string, [R] string, [S] string, [T] string, [U] string, [V] string, [W] string, [X] string, [Y] string, [Z] string, [AA] string, [AB] string, [AC] string, [AD] string, [AE] string, [AF] string, [AG] string, [AH] string, [AI] string, [AJ] string, [AK] string, [AL] string, [AM] string, [AN] string, [AO] string, [AP] string, [AQ] string, [AR] string, [AS] string, [AT] string, [AU] string, [AV] string, [AW] string, [AX] string, [AY] string, [AZ] string, [BA] string, [BB] string, [BC] string, [BD] string, [BE] string, [BF] string, [BG] string, [BH] string, [BI] string, [BJ] string, [BK] string, [BL] string, [BM] string, [BN] string, [BO] string, [BP] string, [BQ] string, [BR] string, [BS] string, [BT] string, [BU] string, [BV] string, [BW] string, [BX] string, [BY] string, [BZ] string, [CA] string, [CB] string, [CC] string, [CD] string, [CE] string, [CF] string, [CG] string, [CH] string, [CI] string, [CJ] string, [CK] string, [CL] string, [CM] string, [CN] string, [CO] string, [CP] string, [CQ] string, [CR] string, [CS] string, [CT] string, [CU] string, [CV] string, [CW] string, [CX] string, [CY] string, [CZ] string, [DA] string, [DB] string, [DC] string, [DD] string, [DE] string, [DF] string, [DG] string, [DH] string, [DI] string, [DJ] string, [DK] string, [DL] string, [DM] string, [DN] string, [DO] string, [DP] string, [DQ] string, [DR] string, [DS] string, [DT] string, [DU] string, [DV] string, [DW] string, [DX] string, [DY] string, [DZ] string, [EA] string, [EB] string, [EC] string, [ED] string, [EE] string, [EF] string, [EG] string, [EH] string, [EI] string, [EJ] string, [EK] string, [EL] string, [EM] string, [EN] string, [EO] string, [EP] string, [EQ] string, [ER] string, [ES] string, [ET] string, [EU] string, [EV] string, [EW] string, [EX] string, [EY] string, [EZ] string, [FA] string, [FB] string, [FC] string, [FD] string, [FE] string, [FF] string, [FG] string, [FH] string, [FI] string, [FJ] string, [FK] string, [FL] string, [FM] string, [FN] string, [FO] string, [FP] string, [FQ] string, [FR] string, [FS] string, [FT] string, [FU] string, [FV] string, [FW] string, [FX] string, [FY] string, [FZ] string, [GA] string, [GB] string, [GC] string, [GD] string, [GE] string, [GF] string, [GG] string, [GH] string, [GI] string, [GJ] string, [GK] string, [GL] string, [GM] string, [GN] string, [GO] string, [GP] string, [GQ] string, [GR] string, [GS] string, [GT] string, [GU] string, [GV] string, [GW] string, [GX] string, [GY] string, [GZ] string, [HA] string, [HB] string, [HC] string, [HD] string, [HE] string, [HF] string, [HG] string, [HH] string, [HI] string, [HJ] string, [HK] string, [HL] string, [HM] string, [HN] string, [HO] string, [HP] string, [HQ] string, [HR] string, [HS] string, [HT] string, [HU] string, [HV] string, [HW] string, [HX] string, [HY] string, [HZ] string
FROM @ExcelFile
USING new oh22is.Analytics.Formats.ExcelExtractor("Ark1");
OUTPUT @Resources
TO "/unpivotBasic1.txt"
USING Outputters.Csv();
输出:
Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9 Column10 Column11 Column12 Column13 Column14 Column15 Column16 Column17 Column18 Column19 Column20 Column21 Column22 Column23 Column24 Column25 Column26 Column27 Column28 Column29 Column30 Column31 Column32 Column33 Column34 Column35 Column36 Column37 Column38 Column39 Column40 Column41 Column42 Column43 ... Column226
SUM: 36,8 40,2 45,6 45,85 55,05 59,1 51,4 49,1 49,3 0 39,8 39,6 44,5 45,2 45 41,5 44,3 46,8 46,7 46,5 46,5 0 41 41,9 41,3 41,1 27,5 17,6 18 12,3 11,3 8,8 8 0 7,8 7,8 7,4 7,4 7,4 7,4 7,4 7,4 ... 7,4
ÅR 2019 2020 2021 2022 ...
Mnd Oktober November Desember Januar Februar Mars April Mai Juni Juli August September Oktober November Desember Januar Februar Mars April Mai Juni Juli August September Oktober November Desember Januar Februar Mars April Mai Juni Juli August September Oktober November November November November November ... November
[输出正确,但第[AN]至[HZ]列(第40至234列)除外,该列重复[AM]列或第39列的值,该列是原始Excel中数据的最后一列。如何摆脱这些重复的值,或者我做错了什么?最终目标是将这些数据分解为“年”,“月”和“总和”列。
[通过查看oh22is.Analytics.Formats中的代码,发现原因很明显。循环结构不会删除/更新最后一个单元格的值。解决了我的办公桌抽屉中的问题:AzureDataLake-GitHub