如何将月份添加到当前日期，其中要添加的月份数将来自列名

Question

我想爆炸Spark Scala中的列

reference_month          M            M+1          M+2       

2020-01-01               10           12           10        

2020-02-01               10           12           10

输出应该像

reference_month      Month  reference_date_id 
2020-01-01               10     2020-01
2020-01-01               12     2020-02
2020-01-01               10     2020-03
2020-02-01               10     2020-02
2020-02-01               12     2020-03
2020-02-01               10     2020-04

其中reference_date_id = reference_month + x（其中x源自m，m + 1，m + 2）。

有什么方法可以在spark scala中获得这种格式的输出？

Answer 1

我们可以用array，M，M+1创建M+2，然后分解array以获得所需的数据帧。

Example:

df.selectExpr("reference_month","array(M,`M+1`,`M+2`)as arr").
selectExpr("reference_month","explode(arr) as Month").show()

+---------------+-----+
|reference_month|Month|
+---------------+-----+
|         202001|   10|
|         202001|   12|
|         202001|   10|
|         202002|   10|
|         202002|   12|
|         202002|   10|
+---------------+-----+

//or
val cols= Seq("M","M+1","M+2")

df.withColumn("arr",array(cols.head,cols.tail:_*)).drop(cols:_*).
selectExpr("reference_month","explode(arr) as Month").show()

Answer 2

您可以取消Apache Spark的技术

import org.apache.spark.sql.functions.expr
data.select($"reference_month",expr("stack(3,`M`,`M+1`,`M+2`) as (Month )")).show()

Answer 3

You can use **stack** function 
from pyspark.sql.functions import expr
exp = expr("""stack(3,`M`,`M+1`,`M+2`) as (Values)""")  
from pyspark.sql.functions import when,concat_ws,lpad,row_number
from pyspark.sql.window import Window
w = Window().partitionBy("reference_month").orderBy("reference_month")
df.select("reference_month",exp)\
.withColumn("reference_date_id ",concat_ws('-',substring("reference_month",1,4),\
when(length(row_number().over(w))<2,lpad(row_number().over(w),2,'0'))\
.otherwise(row_number().over(w)))).show()
+---------------+----+------------------+
|reference_month|valz|reference_date_id |
+---------------+----+------------------+
|         202022|  10|           2020-01|
|         202022|  12|           2020-02|
|         202022|  10|           2020-03|
|         202001|  10|           2020-01|
|         202001|  12|           2020-02|
|         202001|  10|           2020-03|
+---------------+----+------------------+

如何将月份添加到当前日期，其中要添加的月份数将来自列名

问题描述投票：-1回答：3

3个回答

最新问题

如何将月份添加到当前日期，其中要添加的月份数将来自列名

问题描述 投票：-1回答：3

3个回答

最新问题

问题描述投票：-1回答：3