什么是减少和合并list（list（dict（）（）））的有效方法，其中某些字典可能具有相同的键但值不同

Question

上下文：我有一个组织ID列表，其中每个组织ID具有多个帐户ID和电子邮件对。每封电子邮件与每个组织的一个帐户ID（组织ID）相关联（唯一）。并非所有电子邮件都在每个组织中，但是有些电子邮件在多个组织中甚至所有组织中。例如，如果有5个组织，则每个组织都有未知数量的帐户ID（电子邮件对）。帐户ID不论与哪个组织相关联都是唯一的，但是在多个组织中有一些电子邮件与不同的帐户ID相关联。]

我的数据采用以下结构，我正尝试在python中执行此操作：

# Note: Each AccountID Value is unique across the board # Note: Emails are unique per organization, but can be in multiple organizations. [ [ # The value for OrganizationID is the same throughout the list of dictionaries. { "some-email A": "AccountID", "OrganizationID": "Organization A" # <- The ID is just a string of numbers. }, { "some-email B": "AccountID", "OrganizationID": "Organization A" }, { "some-email C": "AccountID", "OrganizationID": "Organization A" }, ... ], ... [ { "some-email C": "AccountID", #. <- Also in organization A but different Account ID "OrganizationID": "Organization LK" }, { "some-email K": "AccountID", "OrganizationID": "Organization LK" }, ... ], ... ]

顺序无所谓！我的最终目标是将其转换为以下新的数据结构。

# Note: Reference is just a list of strings where each string is 
# a concatenation of the "OrganizationID:AccountID" of the respective email.
[
    {
        "Email": "some-email A",
        "Reference": [
            "[Organization A]:[Account ID of "some-email A" in Organization A if exists]",
            ...
            "[Organization X]:[Account ID of "some-email A" in Organization X if exists]",
            ...
        ]
    },
    ...
    {
        "Email": "some-email C",
        "Reference": [
            "[Organization A]:[Account ID of "some-email C" in Organization A if exists]",
            ...
            "[Organization LK]:[Account ID of "some-email C" in Organization LK if exists]",
            ...
        ]
    },
]
我当前的数据集有1000多个组织，每个组织都有任意数量的帐户。一些组织可能只有一个或两个帐户，而其他组织则有600多个帐户。没有组织拥有零帐户。

编辑：我当前的解决方案如下：但是我想看看是否有更有效的方法来解决这个问题。

re = list()
seen = set()
for _p in dt: # <- this is the first data set list(list(dict()))
    for x in _p: # <- Each dictionary in the list(dict())
        em = list(x.keys())[1] # <- some-email key
        if em not in seen:
            seen.add(em)
            re.append({
                "Email": em,
                "Reference": [x["OrganizationID"] + ":" + x[em]]
            })
        else:
            d = next(i for i in re if i['Email'] == em)
            d["Reference"].append(x["OrganizationID"] + ":" + x[em])

上下文：我有一个组织ID列表，其中每个组织ID具有多个帐户ID和电子邮件对。每封电子邮件与每个组织的一个帐户ID（组织ID）相关联（唯一）。 ...

Answer 1

由于数据的结构方式，您正在做的事情将需要嵌套的for循环，但是我认为，如果删除if em not in seen子句，您将获得更好的性能，因为这需要它自己遍历一个不存在的集合。不必首先创建集合就可以减少开销。这是我的方法：

什么是减少和合并list（list（dict（）（）））的有效方法，其中某些字典可能具有相同的键但值不同

问题描述投票：0回答：1

1个回答

最新问题

什么是减少和合并list（list（dict（）（）））的有效方法，其中某些字典可能具有相同的键但值不同

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1