我正在将json文件导入pyspark数据框。我已经用以下代码导入了json
df = sqlContext.read.json("json_file.json").select("item", "attributes")
我想将属性从一列拆分为多列。
这里是示例json格式:
{"item":"item-1","attributes":{"att-a":"att-a-15","att-b":"att-b-10","att-c":"att-c-7"}}
{"item":"item-2","attributes":{"att-a":"att-a-15","att-b":"att-b-10","att-c":"att-c-7"}}
+------+--------+--------+-------+
| item| att-a| att-b| att-c|
+------+--------+--------+-------+
|item-1|att-a-15|att-b-10|att-c-7|
|item-2|att-a-15|att-b-10|att-c-7|
+------+--------+--------+-------+
使用
from pyspark.sql import functions as f df.select('item','attributes.*').show()
以便所有属性都可以在多列中看到。