DataFrame中的平面嵌套模式,获得AnalysisException:无法解析列名

问题描述 投票:0回答:1

我有DF:

 -- str1: struct (nullable = true)
 |    |-- a1: string (nullable = true)
 |    |-- a2: string (nullable = true)
 |    |-- a3: string (nullable = true)
 |-- str2: string (nullable = true)
 |-- str3: string (nullable = true)
 |-- str4: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- b1: string (nullable = true)
 |    |    |-- b2: string (nullable = true)
 |    |    |-- b3: boolean (nullable = true)
 |    |    |-- b4: struct (nullable = true)
 |    |    |    |-- c1: integer (nullable = true)
 |    |    |    |-- c2: string (nullable = true)
 |    |    |    |-- c3: integer (nullable = true)

我正在尝试使其变平,以实现我在下面使用的代码:

  def flattenSchema(schema: StructType, prefix: String = null):Array[Column]=
  {
    schema.fields.flatMap(f => {
      val colName = if (prefix == null) f.name else (prefix + "." + f.name)

      f.dataType match {
        case st: StructType => flattenSchema(st, colName)
        case at: ArrayType =>
          val st = at.elementType.asInstanceOf[StructType]
          flattenSchema(st, colName)
        case _ => Array(new Column(colName).as(colName))
      }
    })
  }


val d1 = df.select(flattenSchema(df.schema):_*)

它在输出以下给我:

 |-- str1.a1: string (nullable = true)
 |-- str1.a2: string (nullable = true)
 |-- str1.a3: string (nullable = true)
 |-- str2: string (nullable = true)
 |-- str3: string (nullable = true)
 |-- str4.b1: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- str4.b2: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- str4.b3: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- str4.b4.c1: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- str4.b4.c2: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- str4.b4.c3: array (nullable = true)
 |    |-- element: integer (containsNull = true)

我尝试查询时出现问题:

d1.select("str2").show-它没有给我任何问题

但是当我在任何平坦的嵌套列上查询时>]

d1.select("str1.a1")

错误:

org.apache.spark.sql.AnalysisException: cannot resolve '`str1.a1`' given input columns: ....

我在这里做错了什么?或其他任何方式来达到期望的结果?

我有一个DF:-str1:struct(nullable = true)| |-a1:字符串(nullable = true) |-a2:字符串(nullable = true) |-a3:字符串(nullable = true)|-str2:字符串(nullable = true)...

scala dataframe apache-spark nested
1个回答
0
投票

Spark不支持带dot(。)

© www.soinside.com 2019 - 2024. All rights reserved.