我使用以下代码将字符串类型的时间timstm_hm
转换为时间戳记时间timstm_hm_timestamp
。这是代码。
from pyspark.sql.functions import col, unix_timestamp
df = df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-mm-dd HH:mm").cast("timestamp"))
这里是结果。
-------------------------------------------------
| timstm_hm | timstm_hm_timestamp |
-------------------------------------------------
|2018-02-08 11:04 | 2018-01-08 11:04:00 |
-------------------------------------------------
|2018-02-27 20:34 | 2018-01-27 20:34:00 |
-------------------------------------------------
|2018-02-23 19:47 | 2018-01-23 19:47:00 |
-------------------------------------------------
为什么转换之间相差一个月?这很奇怪,因为它在1月而不是2月以后才起作用。
您只需要用大写字母mm
替换替换 MM
。
有关更多信息,请参考Java日期格式:Javasimpledate
from pyspark.sql.functions import col, unix_timestamp
df.withColumn('timstm_hm_timestamp', unix_timestamp(col('timstm_hm'), "yyyy-MM-dd HH:mm").cast("timestamp")).show()
+----------------+-------------------+
| timstm_hm|timstm_hm_timestamp|
+----------------+-------------------+
|2018-02-08 11:04|2018-02-08 11:04:00|
+----------------+-------------------+
此外,您可以通过将just to_timestamp
与capital MM
结合使用来获得相同的输出。
from pyspark.sql.functions import to_timestamp
df.withColumn("timestm_hm_timestamp", to_timestamp("timstm_hm","yyyy-MM-dd HH:mm" )).show()
+----------------+--------------------+
| timstm_hm|timestm_hm_timestamp|
+----------------+--------------------+
|2018-02-08 11:04| 2018-02-08 11:04:00|
+----------------+--------------------+