如何以给定的精度创建具有BigDecimal的spark数据集?请参见spark外壳中的以下示例。您将看到我可以使用所需的BigDecimal精度创建一个DataFrame,但是之后无法将其转换为Dataset。
scala> case class BD(dec: BigDecimal)
scala> val highPrecisionDf = List((BigDecimal("12345678901122334455667788990011122233"))).toDF("dec").withColumn("dec", 'dec.cast(DecimalType(38, 0)))
highPrecisionDf: org.apache.spark.sql.DataFrame = [dec: decimal(38,0)]
scala> highPrecisionDf.as[BD]
org.apache.spark.sql.AnalysisException: Cannot up cast `dec` from decimal(38,0) to decimal(38,18) as it may truncate
The type path of the target object is:
- field (class: "scala.math.BigDecimal", name: "dec")
- root class: "BD"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;
类似地,我无法从使用高精度BigDecimal的案例类创建数据集。
scala> List(BD(BigDecimal("12345678901122334455667788990011122233"))).toDS.show()
+----+
| dec|
+----+
|null|
+----+
有没有办法创建一个包含BigDecimal字段的数据集,其精度与默认的小数(38,18)不同?
我发现的一种解决方法是在数据集中使用String来保持精度。如果您不需要将值用作数字(例如排序或数学运算),则此解决方案有效。
val highPrecisionDf = List((BigDecimal("12345678901122334455667788990011122233"))).toDF("dec").withColumn("dec", 'dec.cast(DecimalType(38, 0)))
case class StringDecimal(dec: String)
highPrecisionDf.as[StringDecimal]
默认情况下,spark将在案例类中将Decimal类型(或BigDecimal)的模式推断为DecimalType(38,18)(请参见org.apache.spark.sql.types.DecimalType.SYSTEM_DEFAULT
)
解决方法是将数据集转换为如下所示的数据框
case class TestClass(id: String, money: BigDecimal)
val testDs = spark.createDataset(Seq(
TestClass("1", BigDecimal("22.50")),
TestClass("2", BigDecimal("500.66"))
))
testDs.printSchema()
root
|-- id: string (nullable = true)
|-- money: decimal(38,18) (nullable = true)
import org.apache.spark.sql.types.DecimalType
val testDf = testDs.toDF()
testDf
.withColumn("money", testDf("money").cast(DecimalType(10,2)))
.printSchema()
root
|-- id: string (nullable = true)
|-- money: decimal(10,2) (nullable = true)
您可以检查此链接以获取更详细的信息https://issues.apache.org/jira/browse/SPARK-18484)