如何将具有十进制的spark数据帧转换为具有相同精度的BigDecimal的数据集？

Question

如何以给定的精度创建具有BigDecimal的spark数据集？请参见spark外壳中的以下示例。您将看到我可以使用所需的BigDecimal精度创建一个DataFrame，但是之后无法将其转换为Dataset。

scala> case class BD(dec: BigDecimal)
scala> val highPrecisionDf = List((BigDecimal("12345678901122334455667788990011122233"))).toDF("dec").withColumn("dec", 'dec.cast(DecimalType(38, 0)))
highPrecisionDf: org.apache.spark.sql.DataFrame = [dec: decimal(38,0)]
scala> highPrecisionDf.as[BD]
org.apache.spark.sql.AnalysisException: Cannot up cast `dec` from decimal(38,0) to decimal(38,18) as it may truncate
The type path of the target object is:
- field (class: "scala.math.BigDecimal", name: "dec")
- root class: "BD"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object;

类似地，我无法从使用高精度BigDecimal的案例类创建数据集。

scala> List(BD(BigDecimal("12345678901122334455667788990011122233"))).toDS.show()
+----+
| dec|
+----+
|null|
+----+

有没有办法创建一个包含BigDecimal字段的数据集，其精度与默认的小数（38,18）不同？

Answer 1

我发现的一种解决方法是在数据集中使用String来保持精度。如果您不需要将值用作数字（例如排序或数学运算），则此解决方案有效。

val highPrecisionDf = List((BigDecimal("12345678901122334455667788990011122233"))).toDF("dec").withColumn("dec", 'dec.cast(DecimalType(38, 0)))
case class StringDecimal(dec: String)
highPrecisionDf.as[StringDecimal]

Answer 2

默认情况下，spark将在案例类中将Decimal类型（或BigDecimal）的模式推断为DecimalType（38，18）（请参见org.apache.spark.sql.types.DecimalType.SYSTEM_DEFAULT）

解决方法是将数据集转换为如下所示的数据框

case class TestClass(id: String, money: BigDecimal)

val testDs = spark.createDataset(Seq(
  TestClass("1", BigDecimal("22.50")),
  TestClass("2", BigDecimal("500.66"))
))

testDs.printSchema()

root
 |-- id: string (nullable = true)
 |-- money: decimal(38,18) (nullable = true)

解决方法

import org.apache.spark.sql.types.DecimalType
val testDf = testDs.toDF()

testDf
  .withColumn("money", testDf("money").cast(DecimalType(10,2)))
  .printSchema()

root
 |-- id: string (nullable = true)
 |-- money: decimal(10,2) (nullable = true)

您可以检查此链接以获取更详细的信息https://issues.apache.org/jira/browse/SPARK-18484）

如何将具有十进制的spark数据帧转换为具有相同精度的BigDecimal的数据集？

问题描述投票：0回答：2

2个回答

解决方法

最新问题

如何将具有十进制的spark数据帧转换为具有相同精度的BigDecimal的数据集？

问题描述 投票：0回答：2

2个回答

解决方法

最新问题

问题描述投票：0回答：2