我有一个问题。我有一个嵌套的
JSON
文件:
json_data = '''
{
"appVersion": "",
"device": {
"model": ""
},
"bef": {
"catalog": ""
},
"data": [
{
"timestamp": "",
"label": "",
"category": ""
}
]
}
我想提取所有数据,如果它是嵌套的,我希望用
_
分隔它。
我尝试标准化嵌套的 JSON 文件。我为此使用 json_normalise
。
不幸的是,期望的输出不是我想要和需要的。
此外,我希望可以有任意数量的嵌套值,所以我尝试用循环来解决它。
如何产生所需的输出?
import pandas as pd
import json
json_data = '''
{
"appVersion": "0.0.3",
"device": {
"model": "Lenovo"
},
"bef": {
"catalog": "Manual"
},
"data": [
{
"timestamp": "2024-04-24 12:08:02.415077",
"label": "zuf",
"category": "50"
}
]
}
'''
parsed_json = json.loads(json_data)
def extract_metadata(json_data):
metadata = {}
for key, value in json_data.items():
if isinstance(value, dict):
for k, v in value.items():
metadata[f'{key}_{k}'] = v
else:
metadata[key] = value
return metadata
meta_data = extract_metadata(parsed_json)
df_main = pd.json_normalize(parsed_json['data'], sep='_')
df_meta = pd.DataFrame([meta_data])
df = pd.concat([df_main, df_meta], axis=1)
print(df)
我得到了什么
timestamp label category appVersion device_model \
0 2024-04-24 12:08:02.415077 zuf 50 0.0.3 Lenovo
bef_catalog data
0 Manual [{'timestamp': '2024-04-24 12:08:02.415077', '...
我想要什么
appVersion device_model bef_catalog data_timestamp data_label data_category
0.0.3 Lenovo Manual 2024-04-24 12:08:02.415 zuf 50
import pandas as pd
import json
def flatten_json(json_obj, parent_key='', sep='_'):
items = {}
for k, v in json_obj.items():
new_key = parent_key + sep + k if parent_key else k
if isinstance(v, dict):
items.update(flatten_json(v, new_key, sep=sep))
elif isinstance(v, list):
for i, item in enumerate(v):
items.update(flatten_json(item, f"{new_key}{sep}{i}", sep=sep))
else:
items[new_key] = v
return items
#Your JSON data
json_data = '''
{
"appVersion": "0.0.3",
"device": {
"model": "Lenovo"
},
"bef": {
"catalog": "Manual"
},
"data": [
{
"timestamp": "2024-04-24 12:08:02.415",
"label": "zuf",
"category": 50
}
]
} '''
# Parse the JSON data
parsed_data = json.loads(json_data)
# Flatten the JSON data
flattened_data = flatten_json(parsed_data)
# Convert flattened data to DataFrame
df = pd.DataFrame(flattened_data, index=[0])
print(df)
你可以先像这样展平你的字典:
def flatten_dict(d: dict, pre=''):
new_d = {}
for key, item in d.items():
if isinstance(item, dict):
new_d = {**new_d, **flatten_dict(item, pre=f'{pre}{key}_')}
elif isinstance(item, list):
for i, ele in enumerate(item):
if isinstance(ele, dict):
new_d = {**new_d, **flatten_dict(ele, pre=f'{pre}{key}_{i+1}_')}
else:
new_d[f'{pre}{key}_{i+1}'] = ele
else:
new_d[f'{pre}{key}'] = item
return new_d
在您的代码中您错过了
list
类型。我添加了枚举器,因为 list
中可能有多个字典。如果您确定每个列表始终最多有 1 个元素,您可以删除 i
。或者添加一个检查或其他东西来查看列表的长度是否为 1。
将其转换为 pandas:
pd.json_normalize(flatten_dict(parsed_json))
函数的输出:
flatten_dict(parsed_json)
{'appVersion': '0.0.3',
'device_model': 'Lenovo',
'bef_catalog': 'Manual',
'data_1_timestamp': '2024-04-24 12:08:02.415077',
'data_1_label': 'zuf',
'data_1_category': '50'}