使用 pyspark dataframe 从嵌套 json 中提取数据

问题描述 投票:0回答:1

我在名为 json_col 的列中有以下数据,用于 databricks 中的数据框产品,该产品还具有其他列。 json_col 的数据如下

html:null ,language:null ,message:null ,product:{"title":"selecteddata","search_alias":{"title":"Home ","value":"kitchen"},,"content":{"all_images":[{"ee":"eeee","name":"front page asdsadaasdasd"},{"dasdas":"sduahdjka","name":"asdsadaasdasd"},{"dasdas":"edkjas","name":"asdsadaasdasd "},{"dasdas":"dakjs","name":"Plumeri asdsadaasdasd Spucktüche"},"dasdas":"dkasjhasdasnd","name":"diasdjaskldnasn"}],"body_text":"dkjasda,"},"climate_pledge_friendly":"No",}

在所有内容中,我只需要选择唯一的内容数据

{"all_images":[{"ee":"eeee","name":"front page asdsadaasdasd"},{"dasdas":"sduahdjka","name":"asdsadaasdasd"},{"dasdas":"edkjas","name":"asdsadaasdasd "},{"dasdas":"dakjs","name":"Plumeri asdsadaasdasd Spucktüche"},"dasdas":"dkasjhasdasnd","name":"diasdjaskldnasn"}],"body_text":"dkjasda,"}

我正在使用以下

from pyspark.sql.functions import col, get_json_object

# Extract a_plus_content as a dictionary
extracted_data = product.withColumn('content_extract', get_json_object('json_col', '$.content'))

# Show the results (optional)
extracted_data.select('content_extract').show()

但是它显示为空。 我们可以提供一些专家建议或其他解决方案来解决上述问题吗

pyspark databricks
1个回答
0
投票

使用

from_json
函数和 schema 从 json 字符串中提取特定部分。检查下面的代码。

product
.withColumn(
  "content", 
  expr("from_json(in, 'product struct<content:string>').product.content")
)
.select("content")
.show(2, False)
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|content                                                                                                                                                                                                                                                                                           |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"all_images":[{"ee":"eeee","name":"front page asdsadaasdasd"},{"dasdas":"sduahdjka","name":"asdsadaasdasd"},{"dasdas":"edkjas","name":"asdsadaasdasd "},{"dasdas":"dakjs","name":"Plumeri asdsadaasdasd Spucktüche"},{"dasdas":"dkasjhasdasnd","name":"diasdjaskldnasn"}],"body_text":"dkjasda,"}|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
© www.soinside.com 2019 - 2024. All rights reserved.