什么是减少和合并list(list(dict()()))的有效方法,其中某些字典可能具有相同的键但值不同

问题描述 投票:0回答:1

上下文:我有一个组织ID列表,其中每个组织ID具有多个帐户ID和电子邮件对。每封电子邮件与每个组织的一个帐户ID(组织ID)相关联(唯一)。并非所有电子邮件都在每个组织中,但是有些电子邮件在多个组织中甚至所有组织中。例如,如果有5个组织,则每个组织都有未知数量的帐户ID(电子邮件对)。帐户ID不论与哪个组织相关联都是唯一的,但是在多个组织中有一些电子邮件与不同的帐户ID相关联。]

我的数据采用以下结构,我正尝试在python中执行此操作:

# Note: Each AccountID Value is unique across the board
# Note: Emails are unique per organization, but can be in multiple organizations.
[
    [
        # The value for OrganizationID is the same throughout the list of dictionaries.
        {
            "some-email A": "AccountID",
            "OrganizationID": "Organization A"  # <- The ID is just a string of numbers.
        },
        {
            "some-email B": "AccountID",
            "OrganizationID": "Organization A"
        },
        {
            "some-email C": "AccountID",
            "OrganizationID": "Organization A"
        },
        ...
    ],
    ...
    [
        {
            "some-email C": "AccountID",   #. <- Also in organization A but different Account ID
            "OrganizationID": "Organization LK"
        },
        {
            "some-email K": "AccountID",
            "OrganizationID": "Organization LK"
        },
        ...
    ],
    ...
]

顺序无所谓!我的最终目标是将其转换为以下新的数据结构。

# Note: Reference is just a list of strings where each string is 
# a concatenation of the "OrganizationID:AccountID" of the respective email.
[
    {
        "Email": "some-email A",
        "Reference": [
            "[Organization A]:[Account ID of "some-email A" in Organization A if exists]",
            ...
            "[Organization X]:[Account ID of "some-email A" in Organization X if exists]",
            ...
        ]
    },
    ...
    {
        "Email": "some-email C",
        "Reference": [
            "[Organization A]:[Account ID of "some-email C" in Organization A if exists]",
            ...
            "[Organization LK]:[Account ID of "some-email C" in Organization LK if exists]",
            ...
        ]
    },
]

我当前的数据集有1000多个组织,每个组织都有任意数量的帐户。一些组织可能只有一个或两个帐户,而其他组织则有600多个帐户。没有组织拥有零帐户。

编辑:我当前的解决方案如下:但是我想看看是否有更有效的方法来解决这个问题。

re = list()
seen = set()
for _p in dt: # <- this is the first data set list(list(dict()))
    for x in _p: # <- Each dictionary in the list(dict())
        em = list(x.keys())[1] # <- some-email key
        if em not in seen:
            seen.add(em)
            re.append({
                "Email": em,
                "Reference": [x["OrganizationID"] + ":" + x[em]]
            })
        else:
            d = next(i for i in re if i['Email'] == em)
            d["Reference"].append(x["OrganizationID"] + ":" + x[em])

上下文:我有一个组织ID列表,其中每个组织ID具有多个帐户ID和电子邮件对。每封电子邮件与每个组织的一个帐户ID(组织ID)相关联(唯一)。 ...

python structure
1个回答
0
投票

由于数据的结构方式,您正在做的事情将需要嵌套的for循环,但是我认为,如果删除if em not in seen子句,您将获得更好的性能,因为这需要它自己遍历一个不存在的集合。不必首先创建集合就可以减少开销。这是我的方法:

© www.soinside.com 2019 - 2024. All rights reserved.