将嵌套的JSON结构反序列化为Django模型对象DRF序列化器

问题描述 投票:4回答:2

我需要使用一种发送包含JSON序列化嵌套结构的JSON响应的服务,我想将其反序列化并存储在数据库中-我的应用程序使用Django。

业务规则如下:

  1. 查询返回的对象始终具有id属性,该属性是唯一的整数,通常具有createdAt属性和updatedAt属性,都具有日期时间数据,然后具有原始类型的其他几个属性( int,float,str,datetime等),以及可以是另一个对象或对象数组的几个属性。

  2. 如果属性值是一个对象,则父对象通过“外键”将其关联。如果它是一个对象数组,那么我们有两种情况:要么数组的对象通过“外键”与父对象相关,要么父对象和数组的每个成员通过“多对多”相关'关系。

  3. 我需要在数据库中镜像每个对象,因此每个模型都有一个id字段作为主键,但是它不会自动生成,因为真实ID将随导入的数据一起提供。

  4. 所有这些实体之间的关系已在我的模型架构中反映出来。 我采用了这种方法(镜像数据结构),因为如果我将接收到的数据展平以将其全部保存到一个表中,将会发生可怕的复制,违反所有数据规范化规则。

    ] >>
  5. 对于每个根对象,我都需要这样做:

  • 检查数据库中是否存在该id的记录
  • 如果没有,请创建一个新记录
  • 在已有记录的情况下更新现有记录(如果记录和传入数据的updatedAt值相同,则可能会跳过更新
  • 对每个嵌套对象(为其父级属性之一提供的值)重复执行相同的步骤。

下面,我将复制一个非常简化的示例,该示例是我从服务中接收到的数据以及要存储在其中的模型的。真正的问题是[[更多,更多]]庞大而复杂,这就是为什么我如此想学习一种让ORM能够解决问题的方法。对整个事情进行硬编码将永远花费,除了容易出错而且在将来数据模式发生更改时还会创建维护地狱。

EDIT

:指向previous simplified version of the following JSON and Models *]的链接JSON示例:

{ "id": 37125965, "number": "029073432019403", "idCommunication": "1843768", "docReceivedAt": { "date": "2019-12-20 08:46:42" }, "createdAt": { "date": "2019-12-20 09:01:14" }, "updatedAt": { "date": "2019-12-20 09:01:32" }, "branch": { "id": 20, "name": "REGIONAL OFFICE #3", "address": "457 Beau St., S\u00e3o Paulo, SP, 08547-003", "active": true, "createdAt": { "date": "2013-02-14 23:12:30" }, "updatedAt": { "date": "2019-05-09 13:40:47" } }, "modality": { "id": 1, "valor": "CITA\u00c7\u00c3O", "descricao": "CITA\u00c7\u00c3O", "active": true, "createdAt": { "date": "2014-08-29 20:47:56" }, "updatedAt": { "date": "2014-08-29 20:47:56" } }, "operation": { "id": 12397740, "number": "029073432019403", "startedAt": { "date": "2019-11-07 22:28:25" }, "managementType": 27, "assessmentValue": 5000000, "createdAt": { "date": "2019-12-20 09:01:30" }, "updatedAt": { "date": "2019-12-20 09:01:30" }, "operationClass": { "id": 22, "name": "A\u00c7\u00c3O RESCIS\u00d3RIA", "createdAt": { "date": "2014-02-28 20:24:55" }, "updatedAt": { "date": "2014-02-28 20:24:55" } }, "evaluator": { "id": 26798, "name": "JANE DOE", "level": 1, "active": true, "createdAt": { "date": "2017-02-22 22:54:04" }, "updatedAt": { "date": "2017-03-15 18:03:20" }, "evaluatorsOffice": { "id": 7, "name": "ACME", "area": 4, "active": true, "createdAt": { "date": "2014-02-28 20:25:16" }, "updatedAt": { "date": "2014-02-28 20:25:16" } }, "evaluatorsOffice_id": 7 }, "operationClass_id": 22, "evaluator_id": 26798 }, "folder": { "id": 16901241, "singleDocument": false, "state": 0, "IFN": "00409504174201972", "closed": false, "dataHoraAbertura": { "date": "2019-12-20 09:01:31" }, "dataHoraTransicao": { "date": "2024-12-20 09:01:31" }, "titulo": "CONTROL FOLDER REF. OP. N. 029073432019403", "createdAt": { "date": "2019-12-20 09:01:32" }, "updatedAt": { "date": "2019-12-20 09:01:32" }, "subjects": [ { "id": 22255645, "main": true, "createdAt": { "date": "2019-12-20 09:01:32" }, "updatedAt": { "date": "2019-12-20 09:01:32" }, "subjectClass": { "id": 20872, "name": "SPECIAL RETIREMENT PROCESS", "active": true, "regulation": "8.213/91, 53.831/64, 83.080/79, 2.172/97, 1.663/98, 9.711/98, 9.528/97 AND 9.032/95", "glossary": "SPECIAL RETIREMENT APPLICATION DUE TO HAZARDOUS LABOR CONDITION FOR 15+/20+/25+ YEARS", "createdAt": { "date": "2013-10-18 16:22:44" }, "updatedAt": { "date": "2013-10-18 16:22:44" }, "parent": { "id": 20866, "name": "RETIREMENT BENEFITS", "active": true, "createdAt": { "date": "2013-10-18 16:22:44" }, "updatedAt": { "date": "2013-10-18 16:22:44" }, "parent": { "id": 20126, "name": "SOCIAL SECURITY", "active": true, "createdAt": { "date": "2013-10-18 16:22:42" }, "updatedAt": { "date": "2013-10-18 16:22:42" } }, "parent_id": 20126 }, "parent_id": 20866 }, "subjectClass_id": 20872 } ], "person": { "id": 7318, "isClient": true, "isRelated": false, "name": "SOCSEC CO.", "createdAt": { "date": "2013-02-14 23:11:43" }, "updatedAt": { "date": "2019-11-18 16:05:07" } }, "operation": { "id": 12397740, "number": "029073432019403", "startedAt": { "date": "2019-11-07 22:28:25" }, "managementType": 27, "assessmentValue": 5000000, "createdAt": { "date": "2019-12-20 09:01:30" }, "updatedAt": { "date": "2019-12-20 09:01:30" } }, "section": { "id": 311, "name": "PROTOCOL", "address": "457 Beau St., ground floor, S\u00e3o Paulo, SP, 08547-003", "active": true, "management": false, "onlyDistribution": true, "createdAt": { "date": "2013-02-14 23:12:31" }, "updatedAt": { "date": "2019-07-05 16:40:34" }, "branch": { "id": 20, "name": "REGIONAL OFFICE #3", "address": "457 Beau St., S\u00e3o Paulo, SP, 08547-003", "active": true, "createdAt": { "date": "2013-02-14 23:12:30" }, "updatedAt": { "date": "2019-05-09 13:40:47" } }, "branch_id": 20 }, "person_id": 7318, "operation_id": 12397740, "section_id": 311 }, "branch_id": 20, "modality_id": 1, "operation_id": 12397740, "folder_id": 16901241 }
Models.py示例:

from django.db import models class Section(models.Model): id = models.PositiveIntegerField(primary_key=True) name = models.CharField(max_length=255, null=True) address = models.CharField(max_length=255, null=True) active = models.BooleanField(default=True) management = models.BooleanField(default=False) onlyDistribution = models.BooleanField(default=False) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() branch = models.ForeignKey('Branch', null=True, on_delete=models.SET_NULL) class Person(models.Model): id = models.PositiveIntegerField(primary_key=True) name = models.CharField(max_length=255, null=True) isClient = models.BooleanField(default=True) isRelated = models.BooleanField(default=True) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() class SubjectClass(models.Model): id = models.PositiveIntegerField(primary_key=True) name = models.CharField(max_length=255, null=True) active = models.BooleanField(default=True) regulation = models.CharField(max_length=255, null=True) glossary = models.CharField(max_length=255, null=True) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() parent = models.ForeignKey('SubjectClass', null=True, on_delete=models.SET_NULL) class Subject(models.Model): id = models.PositiveIntegerField(primary_key=True) main = models.BooleanField(default=False) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() folder = models.ForeignKey('Folder', null=True, on_delete=models.SET_NULL) subjectClass = models.ForeignKey(SubjectClass, null=True, on_delete=models.SET_NULL) class Folder(models.Model): id = models.PositiveIntegerField(primary_key=True) singleDocument = models.BooleanField(default=False) state = models.PositiveSmallIntegerField(null=True) IFN = models.CharField(max_length=31, null=True) closed = models.BooleanField(default=False) title = models.CharField(max_length=255, null=True) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() subjects = models.ManyToManyField(SubjectClass, through=Subject, through_fields=('folder', 'subjectClass')) interestedEntity = models.ForeignKey(Person, null=True, on_delete=models.SET_NULL) class EvaluatorsOffice(models.Model): id = models.PositiveIntegerField(primary_key=True) name = models.CharField(max_length=255, null=True) area = models.PositiveSmallIntegerField(null=True) active = models.BooleanField(default=True) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() class Evaluator(models.Model): id = models.PositiveIntegerField(primary_key=True) name = models.CharField(max_length=255, null=True) level = models.PositiveSmallIntegerField(null=True) active = models.BooleanField(default=True) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() evaluatorsOffice = models.ForeignKey(EvaluatorsOffice, null=True, on_delete=models.SET_NULL) class OperationClass(models.Model): id = models.PositiveIntegerField(primary_key=True) name = models.CharField(max_length=255, null=True) active = models.BooleanField(default=True) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() class Operation(models.Model): id = models.PositiveIntegerField(primary_key=True) number = models.CharField(max_length=31, null=True) startedAt = models.DateTimeField(null=True) managementType = models.PositiveIntegerField(null=True) assessmentValue = models.PositiveIntegerField(null=True) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() operationClass = models.ForeignKey(OperationClass, null=True, on_delete=models.SET_NULL) evaluator = models.ForeignKey(Evaluator, null=True, on_delete=models.SET_NULL) class Branch(models.Model): id = models.PositiveIntegerField(primary_key=True) name = models.CharField(max_length=255, null=True) address = models.CharField(max_length=255, null=True) active = models.BooleanField(default=True) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() class Modality(models.Model): id = models.PositiveIntegerField(primary_key=True) value = models.CharField(max_length=255, null=True) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() class CommunicationRecord(models.Model): id = models.PositiveIntegerField(primary_key=True) number = models.CharField(max_length=31, null=True) idCommunication = models.CharField(max_length=31, null=True) docReceivedAt = models.DateTimeField(null=True) createdAt = models.DateTimeField() updatedAt = models.DateTimeField() branch = models.ForeignKey(Branch, null=True, on_delete=models.SET_NULL) modality = models.ForeignKey(Modality, null=True, on_delete=models.SET_NULL) operation = models.ForeignKey(Operation, null=True, on_delete=models.SET_NULL) folder = models.ForeignKey(Folder, null=True, on_delete=models.SET_NULL)

编辑(参考DRF Serializers):

我正在尝试遵循Max Malysh I Reinstate Monica的建议,并且我开始研究递归串行器:

from django.db.models import Manager, Model, Field, DateTimeField, ForeignKey from rest_framework.serializers import ModelSerializer class RecursiveSerializer(ModelSerializer): manager: Manager field_dict: dict def __init__(self, target_manager: Manager, data: dict, **kwargs): self.manager = target_manager self.Meta.model = self.manager.model self.field_dict = {f.name: f for f in self.manager.model._meta.fields} instance = None data = self.process_data(data) pk_name = self.manager.model._meta.pk.name if pk_name in data: try: instance = target_manager.get(pk=data[pk_name]) except target_manager.model.DoesNotExist: pass super().__init__(instance, data, **kwargs) def process_data(self, data: dict): processed_data = {} for name, value in data.items(): field: Field = self.field_dict.get(name) if isinstance(value, dict): if isinstance(field, ForeignKey): processed_data[name] = self.__class__(field.related_model.objects, data=value) continue elif len(value) == 1 and 'date' in value and isinstance(field, DateTimeField): processed_data[name] = value['date'] continue processed_data[name] = value return processed_data class Meta: model: Model = None fields = '__all__'

但是,它做了一件奇怪的事情:第一次运行时,针对一个空的数据库,它只会创建最后一个且嵌套最深的对象。在第二次运行中,它什么也不做,并返回code='unique'验证错误,表明该对象已存在。

现在我必须说我是Python和Django的新手(我来自.NET开发),我在完成此任务时遇到的困难对我来说显得很尴尬。我一直在阅读有关Django和DRF的文档,这些文档对我的帮助比预期的要差。但是我拒绝相信上述语言和框架缺乏执行这种琐碎操作的资源。因此,如果由于缺少我的知识,我似乎错过了很明显的事情,如果有人教给我我在这里似乎不知道的内容,我将不胜感激。

我需要使用一种服务来发送包含JSON序列化嵌套结构的JSON响应的服务,我想将其反序列化并存储在数据库中-我的应用程序使用Django。业务...

通常,为什么我同意

DRF对于这种情况没有用

:: DRF定义了一个API,并且在许多方面类似于视图而不是模型:它定义了应该导出数据的哪一部分。它能够支持同一数据结构上的所有CRUD操作。在相同数据上可能会有更多API。因此,序列化器与模型分开是正常的。如果第三方程序包应作为新API的一部分,则通常也有必要不更改模型中的任何内容。您只需要创建和更新(无需读取或删除)。您确认不需要任何复杂的安全限制。

EDIT

我用于更新的JSON和模型的代码的主要功能与您的非常相似。那没有道理。我会写更多的注释,减少代码的更改,因为这可能导致模型和JSON不断增长以解释您为什么忽略一些错误。分配给您的重要信息是:1. JSON中存在所有关系中的“直通”实体的数据(以前从未出现过)2.每次更改时,根实体的changedAt时间戳都会更新JSON中的嵌套实体的名称,包括所有中间实体,甚至包括“直通”实体。

from datetime import datetime from django.db import models from django.utils import timezone class UpdateableModel(models.Model): class Meta: abstract = True @classmethod def creupdate(cls, data: dict, save_main_instance: bool = True, no_optimization=False): primary_key_name = cls._meta.pk.name if primary_key_name not in data: raise ValueError(f'parameter \'data\' must contain \'{primary_key_name}\' key (model\'s primary key).') try: instance = cls.objects.get(pk=data[primary_key_name]) at_instance = getattr(instance, 'atualizadoEm', None) at_data = data.get('atualizadoEm', None) operation = 'unchanged' if at_instance and at_data and at_instance >= at_data else 'updated' if operation == 'unchanged' and not no_optimizations: print(f'{operation} instance {primary_key_name} {instance.pk} from {instance._meta.model}') return instance except cls.DoesNotExist: instance = cls() operation = 'created' many_to_many_instances = [] for name, value in data.items(): if isinstance(value, dict): if len(value) == 1 and 'date' in value: date_value = datetime.strptime(value['date'], '%Y-%m-%d %H:%M:%S') if timezone.is_naive(date_value): date_value = timezone.make_aware(date_value) new_value = date_value else: foreign_key = cls._meta.get_field(name) foreign_model = foreign_key.related_model foreign_data: dict = value foreign_instance = foreign_model.creupdate(foreign_data) new_value = foreign_instance elif isinstance(value, list): remote_field = getattr(instance, name) obj_ids = [] for remote_data in value: assert isinstance(remote_data, dict) and remote_field.model._meta.pk.name in remote_data obj_ids.append(remote_field.model.creupdate(remote_data, False).pk) many_to_many_instances.append((remote_field, obj_ids)) else: new_value = value if operation != 'unchanged': setattr(instance, name, new_value) if save_main_instance and operation != 'unchanged': instance.save() print(f'{operation} instance {primary_key_name} {instance.pk} from {instance._meta.model}') for remote_field, obj_ids in many_to_many_instances: remote_field.add(*obj_ids) return instance

注意:

  • 多对多关系已优化为通过一个请求添加所有对象,以在不进行任何更改的情况下最大程度地减少保存次数。 (它是为先前的JSON结构编写的,没有任何显式的“直通”数据)

  • 添加了断言,而不是尝试...,但ValueError:pass(或FieldDoesNotExist)除外。“错误绝不能默默传递。” Zen of Python-特别是在开发中。 (未知的through名称与未知的普通属性类似的错误。)

  • 添加了参数“ no_optimization”,并保留我的逻辑仅对同一实体使用“ modifiedAt”,而不跳过对相关实体的检查。如果发生错误或FieldDoesNotExist不正确地忽略了更新,则稍后可以通过使用no_optimization = True重放数据来更新数据库的状态。如果所有实体都使用时间戳,则它甚至是[[幂等]],并且可以按任何随机顺序(例如,通过重复一段时间的数据并带有一些错误。这对于检查优化情况是否有用也非常有用,例如通过进行优化和不进行优化来处理数据库是否具有相同的状态-例如通过比较导出的sql转储。我的经验是,如果没有替代方法,那么过多依赖时间戳的优化将在以后出现问题。
  • [好,所以我放弃了使用DRF,只是为我的其他模型创建了扩展的抽象模型,并为其提供了我需要的功能,实现了如下所示。
    from datetime import datetime from django.db import models from django.db.models import FieldDoesNotExist from django.utils import timezone class UpdateableModel(models.Model): class Meta: abstract = True @classmethod def creupdate(cls, data: dict, save_main_instance: bool = True): primary_key_name = cls._meta.pk.name if primary_key_name not in data: raise ValueError(f'parameter \'data\' must contain \'{primary_key_name}\' key (model\'s primary key).') try: instance = cls.objects.get(pk=data[primary_key_name]) at_instance = getattr(instance, 'atualizadoEm', None) at_data = data.get('atualizadoEm', None) if at_instance and at_data and at_instance >= at_data: print(f'unchanged instance {primary_key_name} {instance.pk} from {instance._meta.model}') return instance operation = 'updated' except cls.DoesNotExist: instance = cls() operation = 'created' many_to_many_instances = [] for name, value in data.items(): if isinstance(value, dict): if len(value) == 1 and 'date' in value: date_value = datetime.strptime(value['date'], '%Y-%m-%d %H:%M:%S') if timezone.is_naive(date_value): date_value = timezone.make_aware(date_value) setattr(instance, name, date_value) else: foreign_key = cls._meta.get_field(name) foreign_model = foreign_key.related_model foreign_data: dict = value foreign_instance = foreign_model.creupdate(foreign_data) setattr(instance, name, foreign_instance) elif isinstance(value, list): try: relation_field = cls._meta.get_field(name) except FieldDoesNotExist: relation_field = None if relation_field: for through_data in value: try: through_model = getattr(instance, name).through if isinstance(through_data,dict) and through_model._meta.pk.name in through_data: many_to_many_instances.append(through_model.creupdate(through_data, False)) except ValueError: pass else: setattr(instance, name, value) if save_main_instance: instance.save() print(f'{operation} instance {primary_key_name} {instance.pk} from {instance._meta.model}') for many_to_many_instance in many_to_many_instances: many_to_many_instance.save() return instance

    现在,尽管它起作用了(我刚刚使用它导入了很多数据),但由于两个原因,我现在暂时不将其标记为答案:

      我愿意听到对我的实现的批评,这将指出缺陷和使之更健壮和优化的方法。

  • 我仍然希望有比我更好的解决方案。如果万一几个月过去了,什么也没有出现,那么我将假设没有,并且接受我自己的回答。

python json django deserialization json-deserialization
2个回答
1
投票
通常,为什么我同意

DRF对于这种情况没有用


1
投票
from datetime import datetime from django.db import models from django.db.models import FieldDoesNotExist from django.utils import timezone class UpdateableModel(models.Model): class Meta: abstract = True @classmethod def creupdate(cls, data: dict, save_main_instance: bool = True): primary_key_name = cls._meta.pk.name if primary_key_name not in data: raise ValueError(f'parameter \'data\' must contain \'{primary_key_name}\' key (model\'s primary key).') try: instance = cls.objects.get(pk=data[primary_key_name]) at_instance = getattr(instance, 'atualizadoEm', None) at_data = data.get('atualizadoEm', None) if at_instance and at_data and at_instance >= at_data: print(f'unchanged instance {primary_key_name} {instance.pk} from {instance._meta.model}') return instance operation = 'updated' except cls.DoesNotExist: instance = cls() operation = 'created' many_to_many_instances = [] for name, value in data.items(): if isinstance(value, dict): if len(value) == 1 and 'date' in value: date_value = datetime.strptime(value['date'], '%Y-%m-%d %H:%M:%S') if timezone.is_naive(date_value): date_value = timezone.make_aware(date_value) setattr(instance, name, date_value) else: foreign_key = cls._meta.get_field(name) foreign_model = foreign_key.related_model foreign_data: dict = value foreign_instance = foreign_model.creupdate(foreign_data) setattr(instance, name, foreign_instance) elif isinstance(value, list): try: relation_field = cls._meta.get_field(name) except FieldDoesNotExist: relation_field = None if relation_field: for through_data in value: try: through_model = getattr(instance, name).through if isinstance(through_data,dict) and through_model._meta.pk.name in through_data: many_to_many_instances.append(through_model.creupdate(through_data, False)) except ValueError: pass else: setattr(instance, name, value) if save_main_instance: instance.save() print(f'{operation} instance {primary_key_name} {instance.pk} from {instance._meta.model}') for many_to_many_instance in many_to_many_instances: many_to_many_instance.save() return instance

现在,尽管它起作用了(我刚刚使用它导入了很多数据),但由于两个原因,我现在暂时不将其标记为答案:

© www.soinside.com 2019 - 2024. All rights reserved.