jq 递归分析 JSON 对象

问题描述 投票:0回答:3

我有一些巨大的 JSON 文件需要分析,这样我就可以将它们转换成一些表格。我发现

jq
在检查它们时非常有用,但是会有数百个,我对
jq
很陌生。

我已经有了一些非常方便的功能

~/.jq
(非常感谢@mikehwang

def profile_object:
    to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
        | sort_by(.key) | from_entries;

def profile_array_objects:
    map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;

我确定在描述我的问题后我将不得不修改它们。

我想要一条

jq
线来描述单个对象。如果键映射到对象数组,则收集对象中的唯一键,如果那里有嵌套的对象数组,则继续进行分析。如果值是对象,则分析该对象。

抱歉这个例子太长了,但想象一下这个有几个 GB:

{
    "name": "XYZ Company",
    "type": "Contractors",
    "reporting": [
        {
            "group_id": "660",
            "groups": [
                {
                    "ids": [
                        987654321,
                        987654321,
                        987654321
                    ],   
                    "market": {
                        "name": "Austin, TX",
                        "value": "873275"
                    }
                },
                {
                    "ids": [
                        987654321,
                        987654321,
                        987654321
                    ],   
                    "market": {
                        "name": "Nashville, TN",
                        "value": "2393287"
                    }
                }
            ]
        }
    ],
    "product_agreements": [
        {
            "negotiation_arrangement": "FFVII",
            "code": "84144",
            "type": "DJ",
            "type_version": "V10",
            "description": "DJ in a mask",
            "name": "Claptone",
            "negotiated_rates": [
                {
                    "company_references": [
                        1,
                        5,
                        458
                    ],
                    "negotiated_prices": [
                        {
                            "type": "negotiated",
                            "rate": 17.73,
                            "expiration_date": "9999-12-31",
                            "code": [
                                "11"
                            ],
                            "billing_modifier_code": [
                                "124"
                            ],
                            "billing_class": "professional"
                        }
                    ]
                },
                {
                    "company_references": [
                        747
                    ],
                    "negotiated_prices": [
                        {
                            "type": "fee",
                            "rate": 28.42,
                            "expiration_date": "9999-12-31",
                            "code": [
                                "11"
                            ],
                            "billing_class": "professional"
                        }
                    ]
                }
            ]
        },
        {
            "negotiation_arrangement": "MGS3",
            "name": "David Byrne",
            "type": "Producer",
            "type_version": "V10",
            "code": "654321",
            "description": "Frontman from Talking Heads",
            "negotiated_rates": [
                {
                    "company_references": [
                        1,
                        9,
                        2344,
                        8456
                    ],
                    "negotiated_prices": [
                        {
                            "type": "negotiated",
                            "rate": 68.73,
                            "expiration_date": "9999-12-31",
                            "code": [
                                "11"
                            ],
                            "billing_class": "professional"
                        }
                    ]
                },
                {
                    "company_references": [
                        679
                    ],
                    "negotiated_prices": [
                        {
                            "type": "fee",
                            "rate": 89.25,
                            "expiration_date": "9999-12-31",
                            "code": [
                                "11"
                            ],
                            "billing_class": "professional"
                        }
                    ]
                }
            ]
        }
    ],
    "version": "1.3.1",
    "last_updated_on": "2023-02-01"
}

期望的输出:

{
    "name": "string",
    "type": "string",
    "reporting": [
      {
        "group_id": "number",
        "groups": [
            {
                "ids": [
                    "number"
                ],
                "market": {
                    "type": "string",
                    "value": "string"
                }
            }
        ]
      }
    ],
    "product_agreements": [
      {
        "negotiation_arrangement": "string",
        "code": "string",
        "type": "string",
        "type_version": "string",
        "description": "string",
        "name": "string",
        "negotiated_rates": [
          {
            "company_references": [
                "number"
            ],
            "negotiated_prices": [
              {
                "type": "string",
                "rate": "number",
                "expiration_date": "string",
                "code": [
                  "string"
                ],
                "billing_modifier_code": [
                  "string"
                ],
                "billing_class": "string"
              }
            ]
          }
        ]        
      }
    ],
    "version": "string",
    "last_updated_on": "string"
}

如果其中有任何错误,我真的很抱歉,但我尽量使它保持一致并且尽可能简单。

重申需求,如果值是对象或数组,则递归分析 JSON 对象中的每个键。解决方案需要独立于键名。如果需要,很高兴进一步澄清。

json sed schema jq
3个回答
1
投票

jq 模块 schema.jq 位于 https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed 旨在产生您描述的那种结构模式。

对于非常大的输入,它可能会非常慢,因此如果 JSON 足够规则,则可以使用混合策略 - 分析足够多的数据以得出全面的结构模式,然后检查它是否确实如此申请。

对于结构模式的一致性测试,例如由 schema.jq 生成的,请参见https://github.com/pkoppstein/JESS


1
投票

鉴于您的 input.json,这是一个解决方案:

jq '
def schema:
    if   type == "object" then .[] |= schema
    elif type == "array"  then map(schema)|unique
         | if (first | type) == "object" then [add] else . end
    else type
    end;
schema
' input.json

0
投票

这是@Philippe 解决方案的一个变体:它以一种原则性但有损的方式将

map(schema)
中的对象合并为数组。 (所有这些半解决方案以速度换取精度损失。)

注意下面用了

keys_unsorted
;如果使用 gojq,则必须将其更改为
keys
,或者提供 keys_unsorted 的 def。

# Use "JSON" as the union of two distinct types
# except combine([]; [ $x ]) => [ $x ]
def combine($a;$b):
  if $a == $b then $a elif $a == null then $b elif $b == null then $a
  elif ($a == []) and ($b|type) == "array" then $b
  elif ($b == []) and ($a|type) == "array" then $a
  else "JSON"
  end;

# Profile an array by calling mergeTypes(.[] | schema)
# in order to coalesce objects
def mergeTypes(s):
    reduce s as $t (null;
       if ($t|type) != "object" then .types = (.types + [$t] | unique)
       else .object as $o
       | .object = reduce ($t | keys_unsorted[]) as $k ($o;
                    .[$k] = combine( $t[$k]; $o[$k] ) 
          )
       end)
       | (if .object then [.object] else null end ) + .types ;

def schema:
    if   type == "object" then .[] |= schema
    elif type == "array"
    then if . == [] then [] else mergeTypes(.[] | schema) end
    else type
    end;
schema

例子: 输入:

{"a": [{"b":[1]}, {"c":[2]}, {"c": []}] }

输出:

{
  "a": [
    {
      "b": [
        "number"
      ],
      "c": [
        "number"
      ]
    }
  ]
}
© www.soinside.com 2019 - 2024. All rights reserved.