我们有一个包含具有动态模式的 JSON 数据的列,它是
StringType
。我们希望将字符串转换为 JSON 对象数组。我们不能使用 array()
,因为它只会将整个字符串放在数组的第 0 个索引处。 from_json
也需要一个固定的模式,似乎不适合这种情况。有谁知道我们该怎么做?这是示例 JSON 字符串 - 请记住,这只是 1 行中的字符串,我们有多个行,例如:
[
{
"name": "additional_product_information",
"value":
[
{
"valueType": "str",
"text": "Faux-fur booties. Memory foam insoles. Soft and comfy, like walking on air. ...",
"element":
{
"location": "/html[0]/body[1]/section[1]/section[1]/div[1]/div[2]/div[2]/div[1]/section[2]/article[1]/section[1]/div[6]/div[2]",
"attributes":
[]
}
},
{
"valueType": "str",
"text": "Faux-fur booties. Memory foam insoles. Soft and comfy, like walking on air.",
"element":
{
"location": "/html[0]/body[1]/section[1]/section[1]/div[1]/div[2]/div[2]/div[1]/section[2]/article[1]/section[1]/div[6]/div[3]",
"attributes":
[]
}
},
{
"valueType": "str",
"text": "100% Polyester.",
"element":
{
"location": "/html[0]/body[1]/section[1]/section[1]/div[1]/div[2]/div[2]/div[1]/section[2]/article[1]/section[1]/div[6]/div[4]",
"attributes":
[]
}
}
],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "Config"
},
{
"name": "brand_name",
"value":
[
{
"valueType": "str",
"text": "White Stuff",
"element":
{
"location": "/html[0]",
"attributes":
[]
}
}
],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "Config"
},
{
"name": "bread_crumb1",
"value":
[
{
"valueType": "str",
"text": "Womenswear",
"element":
{
"location": "/html[0]/body[1]/section[1]/section[1]/div[1]/div[2]/div[2]/div[1]/section[2]/article[1]",
"attributes":
[]
}
}
],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "Config"
},
{
"name": "bread_crumb2",
"value":
[
{
"valueType": "str",
"text": "Slippers",
"element":
{
"location": "/html[0]/body[1]/section[1]/section[1]/div[1]/div[2]/div[2]/div[1]/section[2]/article[1]",
"attributes":
[]
}
}
],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "Config"
},
{
"name": "bread_crumb3",
"value":
[],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "Config"
},
{
"name": "color_name",
"value":
[],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "Config"
},
{
"name": "customer_star_ratings",
"value":
[],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "Config"
},
{
"name": "has_retail_offer",
"value":
[
{
"valueType": "str",
"text": "D"
}
],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "ASSISTED_SCRAPE"
},
{
"name": "image_url",
"value":
[
{
"valueType": "str",
"text": "https://xcdn.next.co.uk/COMMON/Items/Default/Default/ItemImages/AltItemZoom/M71395s.jpg",
"element":
{
"location": "/html[0]/body[1]/section[1]/section[1]/div[1]/div[2]/div[2]/div[1]/section[1]/section[1]/div[1]/div[3]/div[1]/div[2]/ul[1]/li[1]",
"attributes":
[]
}
},
{
"valueType": "str",
"text": "https://xcdn.next.co.uk/COMMON/Items/Default/Default/ItemImages/AltItemZoom/M71395s2.jpg",
"element":
{
"location": "/html[0]/body[1]/section[1]/section[1]/div[1]/div[2]/div[2]/div[1]/section[1]/section[1]/div[1]/div[3]/div[1]/div[2]/ul[1]/li[2]",
"attributes":
[]
}
},
{
"valueType": "str",
"text": "https://xcdn.next.co.uk/COMMON/Items/Default/Default/ItemImages/AltItemZoom/M71395s3.jpg",
"element":
{
"location": "/html[0]/body[1]/section[1]/section[1]/div[1]/div[2]/div[2]/div[1]/section[1]/section[1]/div[1]/div[3]/div[1]/div[2]/ul[1]/li[3]",
"attributes":
[]
}
}
],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "Config"
},
{
"name": "is_refurbished",
"value":
[
{
"valueType": "str",
"text": "N"
}
],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "ASSISTED_SCRAPE"
},
{
"name": "item_length_width_height_weight",
"value":
[],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "Config"
},
{
"name": "item_number",
"value":
[
{
"valueType": "str",
"text": "M71395",
"element":
{
"location": "/html[0]/body[1]/section[1]/section[1]/div[1]/div[2]/div[5]",
"attributes":
[]
}
}
],
"sourceValue":
[],
"exactSourceValue": true,
"inferredFrom":
[],
"inferredFromSource":
[],
"strategyId": "Config"
}
]
这是我想到的两种可能的解决方案:
解析 json,然后将数组中的每个元素编码回一个字符串(所以我最终得到一个
array<string>
列)
将模式定义为:name 和 value[text:] 因为这是我需要提取的 2 个数据项,然后使用
from_json
函数。 [目前首选的解决方案]
如果你们有更好的方法或对上述解决方案有更好的实施想法,请告诉我。非常感谢:)