在Python中迭代大字典

Question

我编写此函数是为了根据特定逻辑合并税值。它迭代税务字典，查找共享国家/地区代码后缀并具有重叠值的键。当找到这样的键时，它们的值将被合并，并且重复的键将从字典中删除。

def merge_tax_values_new_logic(tax_dict):
    treated_list = set()
    while True:
        changed = False
        for key1, value1 in list(tax_dict.items()):
            country_code = key1[-2:]
            print('current list :',tax_dict)
            if key1 not in treated_list:
                print('current iteration key :' , key1) 
                for key2, value2 in list(tax_dict.items()):
                    if key2.endswith(country_code) and key1 != key2 and any(hl_id in value2 for hl_id in value1):
                        tax_dict[key1].extend(value2)
                        tax_dict.pop(key2)
                        tax_dict[key1] = list(set(tax_dict[key1]))
                        changed = True
                        print( 'current key : ' , key1 , 'matched  with key : ' , key2  ,  'state  of the dict after the pop : ', tax_dict)
                        break
            treated_list.add(key1)
            print('treated list :', treated_list)
            print('******************************')
            if changed:
                break
        if not changed:
            break
    return tax_dict

示例：

new_tax_dict = {'tax1_US':['A'],'tax2_US':['B'], 'tax3_US':['A','B']}
merge_tax_values_new_logic(new_tax_dict)

结果：

    current list : {'tax1_US': ['A'], 'tax2_US': ['B'], 'tax3_US': ['A', 'B']}
    current iteration key : tax1_US
    current key :  tax1_US matched  with key :  tax3_US state  of the dict after the pop :  {'tax1_US': ['A', 'B'], 'tax2_US': ['B']}
    treated list : {'tax1_US'}
    ******************************
    current list : {'tax1_US': ['A', 'B'], 'tax2_US': ['B']}
    treated list : {'tax1_US'}
    ******************************
    current list : {'tax1_US': ['A', 'B'], 'tax2_US': ['B']}
    current iteration key : tax2_US
    current key :  tax2_US matched  with key :  tax1_US state  of the dict after the pop :  {'tax2_US': ['A', 'B']}
    treated list : {'tax2_US', 'tax1_US'}
    ******************************
    current list : {'tax2_US': ['A', 'B']}
    treated list : {'tax2_US', 'tax1_US'}
    ******************************
    {'tax2_US': ['A', 'B']}

它非常适合用很少的键的小字典。然而，当此函数处理字典中的大量键（+40k 个键，每个键的值的平均数量为 5 个元素）时，性能是一个真正的问题。

您还看到其他替代方案吗？

问候，

Answer 1

我建议获取税码排序列表，基本上按最后两个符号排序。

keys = sorted(data.keys(), key = lambda x: x[-2:])

然后您可以巧妙地迭代所有键，确保所有键都已排序。

def merge_tax_values_new_logic(tax_dict):

    def get_country(key):

        return key[-2:]
    
    # sorting keys and thus grouping keys by country code
    keys = sorted(tax_dict.keys(), key = lambda x: get_country(x))

    
    group_key = keys[0]
    group_country = get_country(group_key)
    for key in keys[1:]:
        country_code = get_country(key)
        
        if country_code == group_country:
            tax_dict[group_key].extend(tax_dict.pop(key))

        elif country_code != group_country:
            group_key = key
            group_country = country_code
            
    return tax_dict

在Python中迭代大字典

问题描述投票：0回答：1

1个回答

最新问题

在Python中迭代大字典

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1