unsupportedOperationException使用Joda时间将字符串转换为DateTime时出错

问题描述 投票:1回答:3

我正在使用joda.time.Datetime库将字符串转换为datetime字段但它抛出不支持的异常这是主类代码:

//create new var with input data without header
var inputDataWithoutHeader: RDD[String] = dropHeader(inputFile)
var inputDF1 = inputDataWithoutHeader.map(_.split(",")).map{p =>
val dateYMD: DateTime = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss").parseDateTime(p(8))
testData(dateYMD)}.toDF().show()

p(8)是在类test Data中定义的数据类型datetime的列,列的CSV数据的值如2013-02-17 00:00:00

这是testData类:

case class testData(StartDate: DateTime) { }

这是我得到的错误:

线程“main”中的异常

java.lang.UnsupportedOperationException: Schema for type org.joda.time.DateTime is not supported
    at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:153)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:128)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:126)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:126)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
    at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:64)
    at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29)
    at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:361)
    at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:47)
    at com.projs.poc.spark.ml.ProcessCSV$delayedInit$body.apply(ProcessCSV.scala:37)
scala apache-spark jodatime apache-spark-sql
3个回答
4
投票
  1. 正如你可以在the official documentation中读到的那样,Spark SQL中的日期用java.sql.Timestamp表示。如果您想使用Joda时间,您必须将输出转换为正确的类型
  2. SparkSQL可以使用类型转换轻松处理标准日期格式: sc.parallelize(Seq(Tuple1("2016-01-11 00:01:02"))) .toDF("dt") .select($"dt".cast("timestamp"))

1
投票

感谢zero323的解决方案。我使用了java.sql.Timestamp,这是我修改过的代码

val dateYMD: java.sql.Timestamp = new java.sql.Timestamp(DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss").parseDateTime(p(8)).getMillis)
testData(dateYMD)}.toDF().show()

并把我的班级改为

case class testData(GamingDate: java.sql.Timestamp) { }

0
投票

Scala spark架构不明确支持datetime。您可以探索其他选择。他们是:

1)将datetime转换为millis,您可以保持Long格式。 2)将datetime转换为unixtime(java格式)https://stackoverflow.com/a/44957376/9083843 3)将datetime转换为字符串。您可以随时使用DateTime.parse(“stringdatetime”)更改回joda datetime 4)如果您仍想在scala架构中维护joda datetime,那么您可以将数据帧转换为序列

dataframe.rdd.map(r =>DateTime.parse(r(0).toString()).collect().toSeq
© www.soinside.com 2019 - 2024. All rights reserved.