透视在大多数情况下不能正常工作,即增加源表记录。
source_df
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+-----------+--------------+-------------------+----------------+---------------+---------------+
|model_family_id|classification_type|classification_value|benchmark_type_code| data_date|data_item_code|data_item_value_numeric|data_item_value_string|fiscal_year|fiscal_quarter| create_date|last_update_date|create_user_txt|update_user_txt|
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+-----------+--------------+-------------------+----------------+---------------+---------------+
| 1| COUNTRY| HKG| MEAN|2017-12-31 00:00:00| CREDITSCORE| 13| bb-| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| OBS_CNT|2017-12-31 00:00:00| CREDITSCORE| 649| aa| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| OBS_CNT_CA|2017-12-31 00:00:00| CREDITSCORE| 649| null| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_0|2017-12-31 00:00:00| CREDITSCORE| 3| aa| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_10|2017-12-31 00:00:00| CREDITSCORE| 8| bbb+| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_100|2017-12-31 00:00:00| CREDITSCORE| 23| d| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_25|2017-12-31 00:00:00| CREDITSCORE| 11| bb+| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_50|2017-12-31 00:00:00| CREDITSCORE| 14| b+| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_75|2017-12-31 00:00:00| CREDITSCORE| 15| b| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
| 1| COUNTRY| HKG| PERCENTILE_90|2017-12-31 00:00:00| CREDITSCORE| 17| ccc+| 2017| 4|2018-03-31 14:04:18| null| LOAD| null|
+---------------+-------------------+--------------------+-------------------+-------------------+--------------+-----------------------+----------------------+-----------+--------------+-------------------+----------------+---------------+---------------+
我试过下面的代码
val pivot_df = source_df.groupBy("model_family_id","classification_type","classification_value" ,"data_item_code","data_date","fiscal_year","fiscal_quarter" , "create_user_txt", "create_date")
.pivot("benchmark_type_code" ,
Seq("mean","obs_cnt","obs_cnt_ca","percentile_0","percentile_10","percentile_25","percentile_50","percentile_75","percentile_90","percentile_100")
)
.agg( first(
when( col("data_item_code") === "CREDITSCORE" , col("data_item_value_string"))
.otherwise(col("data_item_value_numeric"))
)
)
我得到了以下的结果,不知道我的代码有什么问题。
+---------------+-------------------+--------------------+--------------+-------------------+-----------+--------------+---------------+-------------------+----+-------+----------+------------+-------------+-------------+-------------+-------------+-------------+--------------+
|model_family_id|classification_type|classification_value|data_item_code| data_date|fiscal_year|fiscal_quarter|create_user_txt| create_date|mean|obs_cnt|obs_cnt_ca|percentile_0|percentile_10|percentile_25|percentile_50|percentile_75|percentile_90|percentile_100|
+---------------+-------------------+--------------------+--------------+-------------------+-----------+--------------+---------------+-------------------+----+-------+----------+------------+-------------+-------------+-------------+-------------+-------------+--------------+
| 1| COUNTRY| HKG| CREDITSCORE|2017-12-31 00:00:00| 2017| 4| LOAD|2018-03-31 14:04:18|null| null| null| null| null| null| null| null| null| null|
+---------------+-------------------+--------------------+--------------+-------------------+-----------+--------------+---------------+-------------------+----+-------+----------+------------+-------------+-------------+-------------+-------------+-------------+--------------+
我尝试在枢轴功能中没有Seq列。但仍然没有像期待的那样转动,任何帮助请???
2)在when子句中,如果旋转列是$“benchmark_type_code”==='OBS_CNT'| 'OBS_CNT'那么它应该采用$ data_item_value_numeric。怎么实现这个?
我不确定你的火花版是2.X.我的软件版本如下:spark ==> 2.2.1 scala ==> 2.11根据以上所述,我得到了正确答案:
+---------------+-------------------+--------------------+--------------+-------------------+-----------+--------------+---------------+-------------------+----+-------+----------+------------+-------------+--------------+-------------+-------------+-------------+-------------+
|model_family_id|classification_type|classification_value|data_item_code| data_date|fiscal_year|fiscal_quarter|create_user_txt| create_date|MEAN|OBS_CNT|OBS_CNT_CA|PERCENTILE_0|PERCENTILE_10|PERCENTILE_100|PERCENTILE_25|PERCENTILE_50|PERCENTILE_75|PERCENTILE_90|
+---------------+-------------------+--------------------+--------------+-------------------+-----------+--------------+---------------+-------------------+----+-------+----------+------------+-------------+--------------+-------------+-------------+-------------+-------------+
| 1| COUNTRY| HKG| CREDITSCORE|2017-12-31 00:00:00| 2017| 4| LOAD|2018-03-31 14:04:18| bb-| aa| | aa| bbb+| d| bb+| b+| b| ccc+|
+---------------+-------------------+--------------------+--------------+-------------------+-----------+--------------+---------------+-------------------+----+-------+----------+------------+-------------+--------------+-------------+-------------+-------------+-------------+
这是我的代码,你可以尝试一下
import spark.implicits._
source_df
.groupBy($"model_family_id",$"classification_type",$"classification_value",$"data_item_code",$"data_date",$"fiscal_year",$"fiscal_quarter",$"create_user_txt",$"create_date")
.pivot("benchmark_type_code")
.agg(
first(
when($"data_item_code"==="CREDITSCORE", $"data_item_value_string")
.otherwise($"data_item_value_numeric")
)
).show()
我们可以在条件如下的条件下工作正常。
.agg( first(
when( col("data").isin("x","a","y","z") ,
when( col("code").isin("aa","bb") , col("numeric")).otherwise(col("string"))
)
.otherwise(col("numeric"))
)