列表理解Python中的错误饲养（或更好的选择）

Question

我有一个嵌套的结构从一个JSON字符串读取，看起来类似于以下...

[
  {
    "id": 1,
    "type": "test",
    "sub_types": [
      {
        "id": "a",
        "type": "sub-test",
        "name": "test1"
      },
      {
        "id": "b",
        "name": "test2",
        "key_value_pairs": [
          {
            "key": 0,
            "value": "Zero"
          },
          {
            "key": 1,
            "value": "One"
          }
        ]
      }
    ]
  }
]

我需要提取和透视数据，随时可以插入到数据库...

[
  (1, "b", 0, "Zero"),
  (1, "b", 1, "One")
]

我做了以下...

data_list = [
  (
    type['id'],
    sub_type['id'],
    key_value_pair['key'],
    key_value_pair['value']
  )
  for type in my_parsed_json_array
  if 'sub_types' in type
  for sub_type in type['sub_types']
  if 'key_value_pairs' in sub_type
  for key_value_pair in sub_type['key_value_pairs']
]

到现在为止还挺好。

我下一步需要做，但是，强制一些限制。例如...

if type['type'] == 'test': raise ValueError('[test] types can not contain key_value_pairs.')

但我不能把那到理解。而且我不希望诉诸循环。我最好的思想至今...

def make_row(type, sub_type, key_value_pair):
    if type['type'] == 'test': raise ValueError('sub-types of a [test] type can not contain key_value_pairs.')
    return (
        type['id'],
        sub_type['id'],
        key_value_pair['key'],
        key_value_pair['value']
    )

data_list = [
  make_row(
    type,
    sub_type,
    key_value_pair
  )
  for type in my_parsed_json_array
  if 'sub_types' in type
  for sub_type in type['sub_types']
  if 'key_value_pairs' in sub_type
  for key_value_pair in sub_type['key_value_pairs']
]

这样的作品，但它会为每一个key_value_pair，感觉多余的检查。（每一组键值对可能有几千双，而检查只需要进行一次，要知道，他们都很好。）

此外，还会有其他的规则与此类似，适用于不同层次的层次。如“测试”类型只能包含“sub_test” sub_types。

什么是上述以外的选项？

更优雅？
更多可扩展的？
更有效率？
更多“Python化”？

Answer 1

你应该阅读有关如何验证您的json数据，并指定明确的模式约束与JSON Schema这个库允许您设置所需要的密钥，指定默认值，添加型验证等

该库在这里有它的Python实现：jsonschema package

例：

from jsonschema import Draft6Validator

schema = {
    "$schema": "https://json-schema.org/schema#",

    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "email": {"type": "string"},
    },
    "required": ["email"]
}
Draft6Validator.check_schema(schema)

Answer 2

我只想用一个简单的循环，但可以将其添加到第一个条件检查，如果你把语句转换成一个功能：

def type_check(type):
    if type['type'] == 'test':
        raise ValueError('sub-types of a [test] type can not contain key_value_pairs.')
    return True


data_list = [
  (
    type['id'],
    sub_type['id'],
    key_value_pair['key'],
    key_value_pair['value']
  )
  for type in my_parsed_json_array
  if 'sub_types' in type
  for sub_type in type['sub_types']
  if  'key_value_pairs' in sub_type and type_check(type)
  for key_value_pair in sub_type['key_value_pairs']
]

Answer 3

你可以尝试沿着线的架构

def validate_top(obj):
    if obj['type'] in BAD_TYPES:
        raise ValueError("oof")
    elif obj['type'] not in IRRELEVANT_TYPES: # actually need to include this
        yield obj

def validate_middle(obj):
    # similarly for the next nested level of data

# and so on

[
    make_row(r)
    for t in validate_top(my_json)
    for m in validate_middle(t)
    # etc...
    for r in validate_last(whatever)
]

一般模式，我这里使用的发电机（函数，而不是表达式）来处理数据，然后解析来收集。

在简单的情况，它是不值得单独拿出处理的多个级别（或不存在自然），你仍然可以写一个generator，只是这样做list(generator(source))。这一点，在我的脑海里，仍比使用普通的功能和手动构建列表清洁 - 它仍然分离“处理” VS“收集”的担忧。

列表理解Python中的错误饲养（或更好的选择）

问题描述投票：1回答：3

3个回答

最新问题

列表理解Python中的错误饲养（或更好的选择）

问题描述 投票：1回答：3

3个回答

最新问题

问题描述投票：1回答：3