Scala Spark将日期转换为特定格式

问题描述 投票:1回答:2

我正在将一些JSON文件读入数据框,我想将其中的字段转换为特定格式,JSON文件具有以下格式为server_received_time的字符串我希望将其转换为yyyy-MM-dd:hh

"server_received_time":"2019-01-26T03:04:36Z"

但无论我绑什么只是返回null

   df.select("server_received_time")
.withColumn("tx_date", to_date($"server_received_time", "yyy-MM-dd:hh").cast("timestamp"))
.withColumn("tx_date2", to_timestamp($"server_received_time", "yyy-MM-dd:hh").cast("timestamp"))
.withColumn("tx_date3", to_date(unix_timestamp($"server_received_time", "yyyy-MM-dd:hh").cast("timestamp")))
.withColumn("tx_date4", to_utc_timestamp(to_timestamp(col("server_received_time"), "yyyy-MM-dd:hh"), "UTC"))
.withColumn("tx_date5", to_timestamp($"server_received_time","yyyy-MM-dd:hh"))

.show(10, false)

+--------------------+-------+--------+--------+--------+--------+
|server_received_time|tx_date|tx_date2|tx_date3|tx_date4|tx_date5|
+--------------------+-------+--------+--------+--------+--------+
|2019-02-18T16:02:20Z|null   |null    |null    |null    |null    |
|2019-02-18T16:02:20Z|null   |null    |null    |null    |null    |
|2019-02-18T16:02:20Z|null   |null    |null    |null    |null    |
|2019-02-18T16:02:20Z|null   |null    |null    |null    |null    |
|2019-02-18T16:02:20Z|null   |null    |null    |null    |null    |
|2019-02-18T16:02:20Z|null   |null    |null    |null    |null    |
|2019-02-18T16:02:20Z|null   |null    |null    |null    |null    |
|2019-02-18T16:02:20Z|null   |null    |null    |null    |null    |
|2019-02-18T16:02:20Z|null   |null    |null    |null    |null    |
|2019-02-18T16:02:20Z|null   |null    |null    |null    |null    |
+--------------------+-------+--------+--------+--------+--------+

我想以这种格式server_received_time获得yyyy-MM-dd:hh

scala apache-spark date-formatting
2个回答
1
投票

to_方法采用实际格式,而不是所需的输出格式。要格式化,您必须将数据转换回字符串

import org.apache.spark.sql.functions._

val df = Seq("2019-02-18T16:02:20Z").toDF("server_received_time")

df.select(date_format(to_timestamp($"server_received_time"), "yyy-MM-dd:hh")).show
// +---------------------------------------------------------------+
// |date_format(to_timestamp(`server_received_time`), yyy-MM-dd:hh)|
// +---------------------------------------------------------------+
// |                                                  2019-02-18:05|
// +---------------------------------------------------------------+

1
投票

格式不同。这应该如下工作:

df.select(date_format(to_timestamp($"server_received_time", "yyyy-MM-dd'T'HH:mm:ss'Z'"), "yyyy-MM-dd:hh").as("custom_date"))
© www.soinside.com 2019 - 2024. All rights reserved.