import spark.implicits._
import org.apache.spark.sql.column
def reverseMap(colName:Column) = map_from_arrays(map_values(colName),map_keys(colName))
val testDF = Seq(("cat",Map("black"->3,"brown"->5,"white"->1)), ("dog",Map("cream"->6,"black"->5,"white"->2)))
.toDF("animal","ageMap")
testDF.show(false)
val testDF1 = testDF.withColumn("keySort",map_from_entries(array_sort(map_entries(col("ageMap")))))
此代码在 Spark >3 中运行良好。我想运行火花<3 .
从您的评论中我了解到您的代码在 v3.2.2 中运行,而不是在 v2.4.5 中运行。
您的问题是 Spark v2.4.5 中不存在
map_entries
。您可以通过使用 map_keys
和 map_values
分别提取键和值,然后使用 array_zip
将它们组合起来来获得相同的功能。
第一位完全一样:
import spark.implicits._
import org.apache.spark.sql.Column
def reverseMap(colName:Column) = map_from_arrays(map_values(colName),map_keys(colName))
val testDF = Seq(("cat",Map("black"->3,"brown"->5,"white"->1)), ("dog",Map("cream"->6,"black"->5,"white"->2))).toDF("animal","ageMap")
testDF.show(false)
+------+------------------------------------+
|animal|ageMap |
+------+------------------------------------+
|cat |[black -> 3, brown -> 5, white -> 1]|
|dog |[cream -> 6, black -> 5, white -> 2]|
+------+------------------------------------+
区别在于你如何定义
testDF1
val testDF1 = testDF
.withColumn("keys", map_keys(col("ageMap")))
.withColumn("values", map_values(col("ageMap")))
.withColumn("keySort", map_from_entries(array_sort(arrays_zip(col("keys"), col("values")))))
.select("animal", "ageMap", "keySort")
testDF1.show(false)
+------+------------------------------------+------------------------------------+
|animal|ageMap |keySort |
+------+------------------------------------+------------------------------------+
|cat |[black -> 3, brown -> 5, white -> 1]|[black -> 3, brown -> 5, white -> 1]|
|dog |[cream -> 6, black -> 5, white -> 2]|[black -> 5, cream -> 6, white -> 2]|
+------+------------------------------------+------------------------------------+
此代码在 v2.4.5 Spark-shell 上成功运行。