将列的数据类型从字符串转换为日期时 Spark 作业失败

问题描述 投票:0回答:1

写入目标时出现以下错误:

作业因阶段失败而中止:阶段 15526.0 中的任务 18 失败 4 次,最近一次失败:阶段 15526.0 中丢失任务 18.3 (TID 3281950) (10.179.0.125 执行程序 1190): org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] “STRING”类型的值“04/12/2024 01:37:07.000 AM”无法转换为“TIMESTAMP”,因为它格式不正确。根据语法更正值,或更改其目标类型。

尝试过:

  1. df_new= (df.withColumn("Date",to_date( to_timestamp("LastUpdateDate","MM/dd/yyyy hh:mm:ss.SSS a"))))
  2. dfnew = df..withColumn("Date", expr("try_cast(LastUpdateDateNew AS DATE)"))
  3. #Convert string to timestamp
       df_new = df.withColumn("LastUpdateTimestamp", unix_timestamp("LastUpdateDate", "MM/dd/yyyy            hh:mm:ss:SSS a").cast("timestamp"))
    
       #Convert timestamp to date in mm/dd/YYYY format
       #df_new_bill = df_new.withColumn("date", to_date((col("LastUpdateTimestamp")), "MM/dd/yyyy"))
    
sql-server pyspark apache-spark-sql
1个回答
0
投票

你可以试试这个:

df_new = (df.withColumn("LastUpdateTimestamp",
                        unix_timestamp(col("LastUpdateDate"), "MM/dd/yyyy hh:mm:ss.SSS a")
                        .cast("timestamp"))
          )

这应该将输入列

LastUpdateDate
转换为所需的输出
LastUpdateTimestamp

© www.soinside.com 2019 - 2024. All rights reserved.