Поворот не работает должным образом большую часть времени, т. Е. Увеличивает записи исходной таблицы.
source_df
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+-----------+--------------+-------------------+----------------+---------------+---------------+
|model_family_id|classification_type|classification_value|benchmark_type_code| data_date|data_item_code|data_item_value_numeric|data_item_value_string|fiscal_year|fiscal_quarter| create_date|last_update_date|create_user_txt|update_user_txt|
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+-----------+--------------+-------------------+----------------+---------------+---------------+
| 1| COUNTRY| HKG| MEAN|2017-12-31 00:00:00| CREDITSCORE| 13| bb-| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| OBS_CNT|2017-12-31 00:00:00| CREDITSCORE| 649| aa| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| OBS_CNT_CA|2017-12-31 00:00:00| CREDITSCORE| 649| null| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_0|2017-12-31 00:00:00| CREDITSCORE| 3| aa| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_10|2017-12-31 00:00:00| CREDITSCORE| 8| bbb+| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_100|2017-12-31 00:00:00| CREDITSCORE| 23| d| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_25|2017-12-31 00:00:00| CREDITSCORE| 11| bb+| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_50|2017-12-31 00:00:00| CREDITSCORE| 14| b+| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_75|2017-12-31 00:00:00| CREDITSCORE| 15| b| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_90|2017-12-31 00:00:00| CREDITSCORE| 17| ccc+| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+-----------+--------------+-------------------+----------------+---------------+---------------+
Я пробовал ниже код
val pivot_df = source_df.groupBy("model_family_id","classification_type","classification_value" ,"data_item_code","data_date","fiscal_year","fiscal_quarter" , "create_user_txt", "create_date")
.pivot("benchmark_type_code" ,
Seq("mean","obs_cnt","obs_cnt_ca","percentile_0","percentile_10","percentile_25","percentile_50","percentile_75","percentile_90","percentile_100")
)
.agg( first(
when( col("data_item_code") === "CREDITSCORE" , col("data_item_value_string"))
.otherwise(col("data_item_value_numeric"))
)
)
Я получаю ниже результатов, не уверен, чтонеправильно в моем коде.
+---------------+-------------------+--------------------+--------------+-------------------+-----------+--------------+---------------+-------------------+----+-------+----------+------------+-------------+-------------+-------------+-------------+-------------+--------------+
|model_family_id|classification_type|classification_value|data_item_code| data_date|fiscal_year|fiscal_quarter|create_user_txt| create_date|mean|obs_cnt|obs_cnt_ca|percentile_0|percentile_10|percentile_25|percentile_50|percentile_75|percentile_90|percentile_100|
+---------------+-------------------+--------------------+--------------+-------------------+-----------+--------------+---------------+-------------------+----+-------+----------+------------+-------------+-------------+-------------+-------------+-------------+--------------+
| 1| COUNTRY| HKG| CREDITSCORE|2017-12-31 00:00:00| 2017| 4| LOAD|2018-03-31 14:04:18|null| null| null| null| null| null| null| null| null| null|
+---------------+-------------------+--------------------+--------------+-------------------+-----------+--------------+---------------+-------------------+----+-------+----------+------------+-------------+-------------+-------------+-------------+-------------+--------------+
Я пытался без Seq столбцов в функции сводки.Но все же это не то, что ожидалось, любая помощь, пожалуйста ???
2) В предложении когда, если поворотный столбец, например, $ "benchmark_type_code" === 'OBS_CNT' |«OBS_CNT», тогда он должен принимать $ data_item_value_numeric.как этого добиться?