从嵌套的json文件中提取数据

问题描述 投票:0回答:1

这是数据的架构,想要从中提取'from'。尝试使用df3 = df.select(df.transcript.data.from.alias(“ Type”))并收到无效的语法错误。

如何提取它。

root
 |-- contactId: long (nullable = true)
 |-- mediaLegId: string (nullable = true)
 |-- transcript: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- action: string (nullable = true)
 |    |    |-- data: struct (nullable = true)
 |    |    |    |-- chatId: string (nullable = true)
 |    |    |    |-- customerInfo: struct (nullable = true)
 |    |    |    |    |-- customerIdentifierToken: string (nullable = true)
 |    |    |    |    |-- customerIdentifierType: string (nullable = true)
 |    |    |    |    |-- customerName: string (nullable = true)
 |    |    |    |    |-- initialQuestion: string (nullable = true)
 |    |    |    |-- entryPoint: string (nullable = true)
 |    |    |    |-- from: string (nullable = true)
 |    |    |    |-- lang: string (nullable = true)
 |    |    |    |-- parkDuration: long (nullable = true)
 |    |    |    |-- parkNote: string (nullable = true)
 |    |    |    |-- participant: struct (nullable = true)
 |    |    |    |    |-- disconnectReason: string (nullable = true)
 |    |    |    |    |-- displayName: string (nullable = true)
 |    |    |    |    |-- participantId: string (nullable = true)
 |    |    |    |    |-- preferences: struct (nullable = true)
 |    |    |    |    |    |-- language: string (nullable = true)
 |    |    |    |    |-- state: string (nullable = true)
 |    |    |    |    |-- userName: string (nullable = true)
 |    |    |    |-- reconnected: boolean (nullable = true)
 |    |    |    |-- relatedData: string (nullable = true)
 |    |    |    |-- text: string (nullable = true)
 |    |    |    |-- timestamp: long (nullable = true)
 |    |    |    |-- transcriptText: string (nullable = true)
 |    |    |    |-- transferNote: string (nullable = true)

| | | |-transcriptText:字符串(可为空= true)| | | |-transferNote:字符串(nullable = true)

pyspark apache-spark-sql pyspark-sql pyspark-dataframes
1个回答
0
投票

尝试像这样使用它

from pyspark.sql import functions as F

df.select(F.explode("transcript").alias('transcript')).select('transcript.*').select("data.*").select("from").show()
© www.soinside.com 2019 - 2024. All rights reserved.