spark数据帧排序的地图列保存到hive回到随机状态

问题描述 投票:1回答:1

我是新来的火花,我试图在火花数据帧中使用udf排序地图类型列,之后我尝试将数据保存到hive,代码如下:

val vectorHead = udf { (z: SparseVector, x: SparseVector, y: mutable.WrappedArray[String]) =>

  var map2 = Map.empty[String, Double]

  for (i <- x.values.indices) {
    if (x.values(i) * z.values(i) >= threshold && y(i)!="") {
      map2+=(y(i)->x.values(i)* z.values(i))
    }
  }

  ListMap(map2.toSeq.sortBy(-_._2):_*)

}

val rescaledDataNew = dataFrame.withColumn("words_with_tf*idf", vectorHead(dataFrame("TFFeatures"), dataFrame("IDFFeatures"), dataFrame("new_words"))).drop("words","TFFeatures","IDFFeatures")

println("This is the new data after drop low TF*IDF")
rescaledDataNew.show()
rescaledDataNew.createTempView("TEST")
rescaledDataNew.sqlContext.sql("DROP TABLE IF EXISTS " + dataSavePath)
rescaledDataNew.sqlContext.sql("CREATE TABLE " + dataSavePath + " AS SELECT * FROM TEST")

运行后,我没有错误没有警告,结果是:

{"美食":6.978342,"游艇":8.91278,"翠园":6.1228666,"花桥镇":10.032949,"青咖喱鸡":6.914152}

我想要的是:

{"花桥镇":10.032949,"游艇":8.91278,"美食":6.978342,"青咖喱鸡":6.914152,"翠园":6.1228666}

何时将代码更改为

ListMap(map2.toSeq.sortBy(-_._2):_*).toString

然后结果是:

Map{"花桥镇"->10.032949,"游艇"->8.91278,"美食"->6.978342,"青咖喱鸡"->6.914152,"翠园"->6.1228666}

所以,任何人都能告诉我应该怎样做才能得到我想要的东西?

scala dictionary apache-spark-sql spark-dataframe
1个回答
0
投票

这似乎是show()方法的一个问题。尝试将df写入文件,它应该按您的意愿排序。

© www.soinside.com 2019 - 2024. All rights reserved.