通过Spark中不同列的值索引映射

Question

我有一个包含以下架构的数据框：

|-- A: map (nullable = true)
|    |-- key: string
|    |-- value: array (valueContainsNull = true)
|    |    |-- element: struct (containsNull = true)
|    |    |    |-- uid: string (nullable = true)
|    |    |    |-- price: double (nullable = true)
|    |    |    |-- type: string (nullable = true)
|-- keyindex: string (nullable = true)

例如，如果我有以下数据：

 {"A":{
 "innerkey_1":[{"uid":"1","price":0.01,"recordtype":"STAT"},
               {"uid":"6","price":4.3,"recordtype":"DYN"}],
 "innerkey_2":[{"uid":"2","price":2.01,"recordtype":"DYN"},
               {"uid":"4","price":6.1,"recordtype":"DYN"}]},
 "innerkey_2"}

我使用以下模式将数据读入数据帧：

val schema = (new StructType().add("mainkey", MapType(StringType, new ArrayType(new StructType().add("uid",StringType).add("price",DoubleType).add("recordtype",StringType), true))).add("keyindex",StringType))

我想弄清楚我是否可以使用keyindex从地图中选择值。由于示例中的keyindex是“innerkey_2”，我希望输出为

[{"uid":"2","price":2.01,"recordtype":"DYN"},
 {"uid":"4","price":6.1,"recordtype":"DYN"}]

谢谢你的帮助！

Answer 1

getItem应该做的伎俩：

scala> val df = Seq(("innerkey2", Map("innerkey2" -> Seq(("1", 0.01, "STAT"))))).toDF("keyindex", "A")
df: org.apache.spark.sql.DataFrame = [keyindex: string, A: map<string,array<struct<_1:string,_2:double,_3:string>>>]

scala> df.select($"A"($"keyindex")).show
+---------------+
|    A[keyindex]|
+---------------+
|[[1,0.01,STAT]]|
+---------------+

通过Spark中不同列的值索引映射

问题描述投票：0回答：1

1个回答

最新问题

通过Spark中不同列的值索引映射

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1