我有两个数据源,我从中检索不同格式的 Json,并且我想创建一个规范化对象,该对象将根据不同的值表示合并的 Json。
例如第一个json:
[
{
"zone_group": "us-east-1b",
"kernel_version": "5.10.130-118.517.amzn2.x86_64",
"chassis_type": "1",
"chassis_type_desc": "Other",
"connection_ip": "172.xx.xx.xx",
"default_gateway_ip": "172.xx.xx.xx",
"connection_mac_address": "12-5e-2e-db-xx-xx"
...
}
]
第二个json:
[
{
"sourceInfo": {
"list": [
{
"Ec2AssetSourceSimple": {
"instanceType": "t2.micro",
"groupName": "AutoScaling-Group-1",
"macAddress": "12-5e-2e-db-xx-xx",
"monitoringEnabled": "false",
"spotInstance": "false",
"zone": "VPC",
"instanceState": "RUNNING",
"type": "EC_2",
"availabilityZone": "us-east-1b",
"privateIpAddress": "172.xx.xx.xx",
"firstDiscovered": "2022-08-18T22:23:04Z"
...
}
]
我想标准化 Json 并根据值创建它们的统一表示,在本例中,IP 地址“172.xx.xx.xx”将在标准化对象中表示一次(名称取自第一个 Json,但是其实并不重要)。
我该如何去做?
IIUC:
代码:
df1 = pd.DataFrame(data=json1)
df2 = pd.json_normalize(data=[x.get("sourceInfo") for x in json2], record_path="list")
final_df = pd.concat(objs=[df1, df2], axis=1)
print(final_df)
输出:
zone_group kernel_version chassis_type chassis_type_desc connection_ip default_gateway_ip connection_mac_address Ec2AssetSourceSimple.instanceType Ec2AssetSourceSimple.groupName Ec2AssetSourceSimple.macAddress Ec2AssetSourceSimple.monitoringEnabled Ec2AssetSourceSimple.spotInstance Ec2AssetSourceSimple.zone Ec2AssetSourceSimple.instanceState Ec2AssetSourceSimple.type Ec2AssetSourceSimple.availabilityZone Ec2AssetSourceSimple.privateIpAddress Ec2AssetSourceSimple.firstDiscovered
0 us-east-1b 5.10.130-118.517.amzn2.x86_64 1 Other 172.xx.xx.xx 172.xx.xx.xx 12-5e-2e-db-xx-xx t2.micro AutoScaling-Group-1 12-5e-2e-db-xx-xx false false VPC RUNNING EC_2 us-east-1b 172.xx.xx.xx 2022-08-18T22:23:04Z