PySpark:将字符串作为时间戳记给出错误的时间

问题描述 投票:1回答:1

我使用以下代码将字符串类型的时间timstm_hm转换为时间戳记时间timstm_hm_timestamp。这是代码。

from pyspark.sql.functions import col, unix_timestamp
df = df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-mm-dd HH:mm").cast("timestamp"))

这里是结果。

-------------------------------------------------
|   timstm_hm         |   timstm_hm_timestamp   |  
-------------------------------------------------
|2018-02-08 11:04     | 2018-01-08 11:04:00     | 
-------------------------------------------------
|2018-02-27 20:34     | 2018-01-27 20:34:00     | 
-------------------------------------------------
|2018-02-23 19:47     | 2018-01-23 19:47:00     | 
-------------------------------------------------

为什么转换之间相差一个月?这很奇怪,因为它在1月而不是2月以后才起作用。

string pyspark timestamp unix-timestamp
1个回答
0
投票

您只需要用大写字母mm替换替换 MM

有关更多信息,请参考Java日期格式:Javasimpledate

from pyspark.sql.functions import col, unix_timestamp
df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-MM-dd HH:mm").cast("timestamp")).show()

+----------------+-------------------+
|       timstm_hm|timstm_hm_timestamp|
+----------------+-------------------+
|2018-02-08 11:04|2018-02-08 11:04:00|
+----------------+-------------------+

此外,您可以通过将just to_timestampcapital MM结合使用来获得相同的输出。

from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()

+----------------+--------------------+
|       timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+
© www.soinside.com 2019 - 2024. All rights reserved.