我编写此函数是为了根据特定逻辑合并税值。它迭代税务字典,查找共享国家/地区代码后缀并具有重叠值的键。当找到这样的键时,它们的值将被合并,并且重复的键将从字典中删除。
def merge_tax_values_new_logic(tax_dict):
treated_list = set()
while True:
changed = False
for key1, value1 in list(tax_dict.items()):
country_code = key1[-2:]
print('current list :',tax_dict)
if key1 not in treated_list:
print('current iteration key :' , key1)
for key2, value2 in list(tax_dict.items()):
if key2.endswith(country_code) and key1 != key2 and any(hl_id in value2 for hl_id in value1):
tax_dict[key1].extend(value2)
tax_dict.pop(key2)
tax_dict[key1] = list(set(tax_dict[key1]))
changed = True
print( 'current key : ' , key1 , 'matched with key : ' , key2 , 'state of the dict after the pop : ', tax_dict)
break
treated_list.add(key1)
print('treated list :', treated_list)
print('******************************')
if changed:
break
if not changed:
break
return tax_dict
示例:
new_tax_dict = {'tax1_US':['A'],'tax2_US':['B'], 'tax3_US':['A','B']}
merge_tax_values_new_logic(new_tax_dict)
结果:
current list : {'tax1_US': ['A'], 'tax2_US': ['B'], 'tax3_US': ['A', 'B']}
current iteration key : tax1_US
current key : tax1_US matched with key : tax3_US state of the dict after the pop : {'tax1_US': ['A', 'B'], 'tax2_US': ['B']}
treated list : {'tax1_US'}
******************************
current list : {'tax1_US': ['A', 'B'], 'tax2_US': ['B']}
treated list : {'tax1_US'}
******************************
current list : {'tax1_US': ['A', 'B'], 'tax2_US': ['B']}
current iteration key : tax2_US
current key : tax2_US matched with key : tax1_US state of the dict after the pop : {'tax2_US': ['A', 'B']}
treated list : {'tax2_US', 'tax1_US'}
******************************
current list : {'tax2_US': ['A', 'B']}
treated list : {'tax2_US', 'tax1_US'}
******************************
{'tax2_US': ['A', 'B']}
它非常适合用很少的键的小字典。然而,当此函数处理字典中的大量键(+40k 个键,每个键的值的平均数量为 5 个元素)时,性能是一个真正的问题。
您还看到其他替代方案吗?
问候,
我建议获取税码排序列表,基本上按最后两个符号排序。
keys = sorted(data.keys(), key = lambda x: x[-2:])
然后您可以巧妙地迭代所有键,确保所有键都已排序。
def merge_tax_values_new_logic(tax_dict):
def get_country(key):
return key[-2:]
# sorting keys and thus grouping keys by country code
keys = sorted(tax_dict.keys(), key = lambda x: get_country(x))
group_key = keys[0]
group_country = get_country(group_key)
for key in keys[1:]:
country_code = get_country(key)
if country_code == group_country:
tax_dict[group_key].extend(tax_dict.pop(key))
elif country_code != group_country:
group_key = key
group_country = country_code
return tax_dict