如何从不同的表格中减去两列?

问题描述 投票:0回答:1

我有两个表数据,其中包含id,功能和均值,其中包含值。功能由13849个值组成。

+---+--------------------+
| id|            features|
+---+--------------------+
| 10|[5.82797050476074...|
| 20|[2.75361084938049...|
| 30|[-2.2027940750122...|
| 40|[4.20199108123779...|
| 50|[2.69677162170410...|
| 60|[2.65212917327880...|
| 70|[3.83443570137023...|
| 80|[0.45349338650703...|
| 90|[3.12527608871459...|

+---+--------------------+

第二张表:

+------------------+
|             value|
+------------------+
|2.4848911616270923|
|2.4848911616270923|
|2.4848911616270923|
|2.4848911616270923|
|2.4848911616270923|
|2.4848911616270923|
|2.4848911616270923|
|2.4848911616270923|
|2.4848911616270923|
+------------------+

代码:

case class DataClass(id: Int, features:Double)
val newDataDF = spark.read.parquet("file:///usr/local/spark/dataset/model/data/user/part-r-00000-7d55ba81-5761-4e36-b488-7e6214df2a68.snappy.parquet").toDF()//.toDF()//.map(_.split(",")).map(p => DataClass(p(0).trim.toInt, p(1).trim.toDouble)).toDF()
newDataDF.withColumn("features", ((newDataDF("features")-2.4848911616270923)/1.8305483113586494))

给我错误

由于数据类型不匹配而无法解析'([features-2.4848911616270923D)':'(features-2.4848911616270923D)'中的类型不同(数组和双精度)。如何解决?

scala apache-spark-sql subtraction
1个回答
0
投票

尝试使用:

val dfWithCalculatedFeatures = newDataDF.withColumn("features", (col("features")(0) - 2.4848911616270923)/1.8305483113586494).show()
© www.soinside.com 2019 - 2024. All rights reserved.