Scala Spark 数据帧映射按键排序

问题描述 投票:0回答:1
import spark.implicits._

import org.apache.spark.sql.column

def reverseMap(colName:Column) = map_from_arrays(map_values(colName),map_keys(colName))

val testDF = Seq(("cat",Map("black"->3,"brown"->5,"white"->1)),  ("dog",Map("cream"->6,"black"->5,"white"->2)))

  .toDF("animal","ageMap")

testDF.show(false)

val testDF1 = testDF.withColumn("keySort",map_from_entries(array_sort(map_entries(col("ageMap")))))

此代码在 Spark >3 中运行良好。我想运行火花<3 .

scala apache-spark maps key-value-observing
1个回答
0
投票

从您的评论中我了解到您的代码在 v3.2.2 中运行,而不是在 v2.4.5 中运行。

您的问题是 Spark v2.4.5 中不存在

map_entries
。您可以通过使用
map_keys
map_values
分别提取键和值,然后使用
array_zip
将它们组合起来来获得相同的功能。

第一位完全一样:

import spark.implicits._
import org.apache.spark.sql.Column

def reverseMap(colName:Column) = map_from_arrays(map_values(colName),map_keys(colName))
val testDF = Seq(("cat",Map("black"->3,"brown"->5,"white"->1)), ("dog",Map("cream"->6,"black"->5,"white"->2))).toDF("animal","ageMap")

testDF.show(false)
+------+------------------------------------+
|animal|ageMap                              |
+------+------------------------------------+
|cat   |[black -> 3, brown -> 5, white -> 1]|
|dog   |[cream -> 6, black -> 5, white -> 2]|
+------+------------------------------------+

区别在于你如何定义

testDF1

val testDF1 = testDF
  .withColumn("keys", map_keys(col("ageMap")))
  .withColumn("values", map_values(col("ageMap")))
  .withColumn("keySort", map_from_entries(array_sort(arrays_zip(col("keys"), col("values")))))
  .select("animal", "ageMap", "keySort")

testDF1.show(false)
+------+------------------------------------+------------------------------------+
|animal|ageMap                              |keySort                             |
+------+------------------------------------+------------------------------------+
|cat   |[black -> 3, brown -> 5, white -> 1]|[black -> 3, brown -> 5, white -> 1]|
|dog   |[cream -> 6, black -> 5, white -> 2]|[black -> 5, cream -> 6, white -> 2]|
+------+------------------------------------+------------------------------------+

此代码在 v2.4.5 Spark-shell 上成功运行。

© www.soinside.com 2019 - 2024. All rights reserved.