Spark SQL 中的 JSON 爆炸(将所有键转换为列)

问题描述 投票:0回答:1

我有如下数据。我希望所有键都转换为列(表结构由与 json 键相同的列组成)。我希望所有值都反映在行中。

WITH dataset AS (
   SELECT '{
      "id": 1,
      "name": "John Doe",
      "age": 30,
      "contacts": [
         {
            "type": "email",
            "value": "[email protected]"
         },
         {
            "type": "phone",
            "value": "555-1234"
         }
      ],
      "orders": [
         {
            "orderId": "A123",
            "products": [
               {
                  "productId": "P001",
                  "name": "Product 1",
                  "quantity": 2
               },
               {
                  "productId": "P002",
                  "name": "Product 2",
                  "quantity": 1
               }
            ],
            "totalAmount": 150.99
         },
         {
            "orderId": "B456",
            "products": [
               {
                  "productId": "P003",
                  "name": "Product 3",
                  "quantity": 3
               }
            ],
            "totalAmount": 75.50
         }
      ]
   }' AS myblob
)

我期待这样的结果:

+----+----------+-----+--------------+---------+-------+---------+---------------+---------------------+-----------+--------------+------------+---------------+----------+-------------+
| id | name     | age | street       | city    | state | zipcode | contact_type  | contact_value       | order_id  | product_id    | product_name | quantity      | totalAmount  |
+----+----------+-----+--------------+---------+-------+---------+---------------+---------------------+-----------+--------------+------------+---------------+--------------+
| 1  | John Doe | 30  | 123 Main St  | Anytown | CA    | 12345   | email         | [email protected] | A123      | P001         | Product 1   | 2             | 150.99       |
| 1  | John Doe | 30  | 123 Main St  | Anytown | CA    | 12345   | email         | [email protected] | A123      | P002         | Product 2   | 1             | 150.99       |
| 1  | John Doe | 30  | 123 Main St  | Anytown | CA    | 12345   | phone         | 555-1234            | B456      | P003         | Product 3   | 3             | 75.50        |
+----+----------+-----+--------------+---------+-------+---------+---------------+---------------------+-----------+--------------+------------+---------------+--------------+

提前致谢。

arrays json apache-spark apache-spark-sql hive
1个回答
0
投票

请找到下面的SQL。

WITH blob_cte AS (
  SELECT 
    FROM_JSON(
      '{ " id ": 1, " name ": " John Doe ", " age ": 30, " contacts ": [ { " type ": " email ", " value ": " john.doe @example.com " }, { " type ": " phone ", " value ": " 555 -1234 " } ], " orders ": [ { " orderId ": " A123 ", " products ": [ { " productId ": " P001 ", " name ": " Product 1 ", " quantity ": 2 }, { " productId ": " P002 ", " name ": " Product 2 ", " quantity ": 1 } ], " totalAmount ": 150.99 }, { " orderId ": " B456 ", " products ": [ { " productId ": " P003 ", " name ": " Product 3 ", " quantity ": 3 } ], " totalAmount ": 75.50 } ] }', 
      'age BIGINT,contacts ARRAY<STRUCT<type: STRING, value: STRING>>,id BIGINT,name STRING,orders ARRAY<STRUCT<orderId: STRING, products: ARRAY<STRUCT<name: STRING, productId: STRING, quantity: BIGINT>>, totalAmount: DOUBLE>>'
    ) AS blob
), 
json_cte AS (
  SELECT 
    blob.* 
  FROM 
    blob_cte
), 
contact_cte AS (
  SELECT 
    id, 
    name, 
    age, 
    INLINE(contacts), 
    orders 
  FROM 
    json_cte
), 
orders_cte AS (
  SELECT 
    id, 
    name, 
    age, 
    type, 
    value, 
    INLINE(orders) 
  FROM 
    contact_cte
) 
SELECT 
  id, 
  name, 
  age, 
  type, 
  value, 
  orderId, 
  totalAmount, 
  name, 
  productId, 
  quantity, 
  INLINE(products) 
FROM 
  orders_cte
© www.soinside.com 2019 - 2024. All rights reserved.