Я пытаюсь прочитать файл Excel с помощью библиотеки oh22is ExcelExtractor и записать в файл csv в Azure Datalake. Файл Excel имеет проблемный c табличный формат, а число столбцов неизвестно (увеличивается на месяц).
Единственное ключевое слово, с которым я столкнулся при работе с этим пользовательским экстрактором, - это ВЫПИСКА. Мой подход заключался в том, чтобы извлечь как можно больше столбцов Excel, начиная с [A] ([A], [B] ... [AA], [AB] ..). Я получаю свои данные, но проблема в том, что значения последнего столбца повторяются.
U- SQL:
USE DATABASE master;
REFERENCE ASSEMBLY [DocumentFormat.OpenXml];
REFERENCE ASSEMBLY [oh22is.Analytics.Formats];
DECLARE @ExcelFile = @SourceFolderPath+@SourceFileName;
@Resources =
EXTRACT [A] string, [B] string, [C] string, [D] string, [E] string, [F] string, [G] string, [H] string, [I] string, [J] string, [K] string, [L] string, [M] string, [N] string, [O] string, [P] string, [Q] string, [R] string, [S] string, [T] string, [U] string, [V] string, [W] string, [X] string, [Y] string, [Z] string, [AA] string, [AB] string, [AC] string, [AD] string, [AE] string, [AF] string, [AG] string, [AH] string, [AI] string, [AJ] string, [AK] string, [AL] string, [AM] string, [AN] string, [AO] string, [AP] string, [AQ] string, [AR] string, [AS] string, [AT] string, [AU] string, [AV] string, [AW] string, [AX] string, [AY] string, [AZ] string, [BA] string, [BB] string, [BC] string, [BD] string, [BE] string, [BF] string, [BG] string, [BH] string, [BI] string, [BJ] string, [BK] string, [BL] string, [BM] string, [BN] string, [BO] string, [BP] string, [BQ] string, [BR] string, [BS] string, [BT] string, [BU] string, [BV] string, [BW] string, [BX] string, [BY] string, [BZ] string, [CA] string, [CB] string, [CC] string, [CD] string, [CE] string, [CF] string, [CG] string, [CH] string, [CI] string, [CJ] string, [CK] string, [CL] string, [CM] string, [CN] string, [CO] string, [CP] string, [CQ] string, [CR] string, [CS] string, [CT] string, [CU] string, [CV] string, [CW] string, [CX] string, [CY] string, [CZ] string, [DA] string, [DB] string, [DC] string, [DD] string, [DE] string, [DF] string, [DG] string, [DH] string, [DI] string, [DJ] string, [DK] string, [DL] string, [DM] string, [DN] string, [DO] string, [DP] string, [DQ] string, [DR] string, [DS] string, [DT] string, [DU] string, [DV] string, [DW] string, [DX] string, [DY] string, [DZ] string, [EA] string, [EB] string, [EC] string, [ED] string, [EE] string, [EF] string, [EG] string, [EH] string, [EI] string, [EJ] string, [EK] string, [EL] string, [EM] string, [EN] string, [EO] string, [EP] string, [EQ] string, [ER] string, [ES] string, [ET] string, [EU] string, [EV] string, [EW] string, [EX] string, [EY] string, [EZ] string, [FA] string, [FB] string, [FC] string, [FD] string, [FE] string, [FF] string, [FG] string, [FH] string, [FI] string, [FJ] string, [FK] string, [FL] string, [FM] string, [FN] string, [FO] string, [FP] string, [FQ] string, [FR] string, [FS] string, [FT] string, [FU] string, [FV] string, [FW] string, [FX] string, [FY] string, [FZ] string, [GA] string, [GB] string, [GC] string, [GD] string, [GE] string, [GF] string, [GG] string, [GH] string, [GI] string, [GJ] string, [GK] string, [GL] string, [GM] string, [GN] string, [GO] string, [GP] string, [GQ] string, [GR] string, [GS] string, [GT] string, [GU] string, [GV] string, [GW] string, [GX] string, [GY] string, [GZ] string, [HA] string, [HB] string, [HC] string, [HD] string, [HE] string, [HF] string, [HG] string, [HH] string, [HI] string, [HJ] string, [HK] string, [HL] string, [HM] string, [HN] string, [HO] string, [HP] string, [HQ] string, [HR] string, [HS] string, [HT] string, [HU] string, [HV] string, [HW] string, [HX] string, [HY] string, [HZ] string
FROM @ExcelFile
USING new oh22is.Analytics.Formats.ExcelExtractor("Ark1");
OUTPUT @Resources
TO "/unpivotBasic1.txt"
USING Outputters.Csv();
Вывод:
Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8 Column9 Column10 Column11 Column12 Column13 Column14 Column15 Column16 Column17 Column18 Column19 Column20 Column21 Column22 Column23 Column24 Column25 Column26 Column27 Column28 Column29 Column30 Column31 Column32 Column33 Column34 Column35 Column36 Column37 Column38 Column39 Column40 Column41 Column42 Column43 ... Column226
SUM: 36,8 40,2 45,6 45,85 55,05 59,1 51,4 49,1 49,3 0 39,8 39,6 44,5 45,2 45 41,5 44,3 46,8 46,7 46,5 46,5 0 41 41,9 41,3 41,1 27,5 17,6 18 12,3 11,3 8,8 8 0 7,8 7,8 7,4 7,4 7,4 7,4 7,4 7,4 ... 7,4
ÅR 2019 2020 2021 2022 ...
Mnd Oktober November Desember Januar Februar Mars April Mai Juni Juli August September Oktober November Desember Januar Februar Mars April Mai Juni Juli August September Oktober November Desember Januar Februar Mars April Mai Juni Juli August September Oktober November November November November November ... November
Вывод правильный, за исключением столбцов [AN] - [HZ] (столбец 40 - столбец 234), которые повторяют значения из столбца [AM] или столбца 39, который является последним столбцом с данными в оригинале Excel. Как я могу избавиться от этих повторяющихся значений или что я делаю не так? Конечная цель - развернуть эти данные в столбцы «Год», «Месяц» и «СУММА».