我有一个ExpenseEntry类型的数据集。ExpenseEntry是一个基本的数据结构,用来跟踪以下的数据 amount
用在每个 category
case class ExpenseEntry(
name: String,
category: String,
amount: BigDecimal
)
价值实例----------------
ExpenseEntry("John", "candy", 0.5)
ExpenseEntry("Tia", "game", 0.25)
ExpenseEntry("John", "candy", 0.15)
ExpenseEntry("Tia", "candy", 0.55)
预期的答案是:
category - name - amount
candy - John - 0.65
candy - Tia - 0.55
game - Tia - 0.25
我想做的是,得到每个原因每个名字的总花费。所以,我有下面的数据集查询。
dataset.groupBy("category", "name").agg(sum("amount"))
我觉得这个查询在理论上似乎是对的。但是,总和显示为 0E-18
我猜测这个数额是被打成了 int
里面 sum
函数。如何投向BigInt?我对这个问题的理解正确吗?
package spark
import org.apache.spark.sql.{DataFrame, SparkSession}
object SumBig extends App{
val spark = SparkSession.builder()
.master("local")
.appName("DataFrame-example")
.getOrCreate()
import spark.implicits._
case class ExpenseEntry(
name: String,
category: String,
amount: BigDecimal
)
val df = Seq(
ExpenseEntry("John", "candy", 0.5),
ExpenseEntry("Tia", "game", 0.25),
ExpenseEntry("John", "candy", 0.15),
ExpenseEntry("Tia", "candy", 0.55)
).toDF()
df.show(false)
val r = df.groupBy("category", "name").sum("amount")
r.show(false)
// +--------+----+--------------------+
// |category|name|sum(amount) |
// +--------+----+--------------------+
// |game |Tia |0.250000000000000000|
// |candy |John|0.650000000000000000|
// |candy |Tia |0.550000000000000000|
// +--------+----+--------------------+
}
df.groupBy("category", "name").agg( sum(bround( col("amount"),2) ).as("sum_amount")).show()