每次迭代时获取列表中出现次数最多的元素的计数

问题描述 投票:0回答:1

我需要在处理列表的每次迭代中获取列表中出现次数最多的 ID。它是一个包含正/负 ID 的巨大列表。如果 ID 为正,则增加该 ID 的计数;如果为负,则减少该 ID 的计数。 考虑输入列表如下:

input_list = [6,6,6,2,2,-6,-6,-6,-2,-2]

input_list[0] = 6 -> Count of ID 6:1 ->max count at this iteration is 1
input_list[1] = 6 -> Count of ID 6:2 ->max count at this iteration is 2
input_list[2] = 6 -> Count of ID 6:3 ->max count at this iteration is 3
input_list[3] = 2 -> Count of ID 6:3, Count of ID 2:1 ->max count at this iteration is 3
input_list[4] = 2 -> Count of ID 6:3, Count of ID 2:2 ->max count at this iteration is 3
input_list[5] = -6 -> Count of ID 6:2, Count of ID 2:2 ->max count at this iteration is 2
input_list[6] = -6 -> Count of ID 6:1, Count of ID 2:2 ->max count at this iteration is 1
and so on..

预期输出:

[1,2,3,3,3,2,2,2,1,0]

注:-6表示ID 6的计数减1。

def getMAxIDAfterEachUpdate(input_list):
    res = []
    dic = {}
    for i in range(len(input_list)):
        tmp = abs(input_list[i])
        if tmp not in dic:
            tmp = 1
        else:
            # ID is negative, we decrement its count
            if input_list[i] < 0:
                dic[tmp] -= 1
            else: #its positive, increment its count
                dic[tmp] += 1
        res.append(max(dic.values())
    return res

每次迭代时在字典中查找最大值的成本很高。有更好的方法找到结果吗?

python dictionary iteration max
1个回答
0
投票

使用简单的字典来存储每个ID的计数的方法的主要问题是字典值是无序的,因此在每次迭代中必须对字典值执行完整扫描以获得最大值,从而导致时间复杂度O(n x m),其中 n 是输入列表的大小,m 是唯一 ID 的数量。

由于我们知道给定 ID 的计数在每次迭代中只能递增或递减 1,因此优化算法的一种方法是将 ID 计数以及共享该计数的 ID 数量存储在 a 的节点中。改为双链表,每次迭代时将节点中存储的计数按升序排列,这样需要O(1)的时间复杂度来获取最大计数,并且该最大计数总是在最后一个节点,从而提高了总体时间复杂度为 O(n)

其中

nodes
是将 ID 映射到节点的字典,
counts
是双向链表,其中每个节点存储一对共享计数的 ID 的
count
number
,使用
 的示例输入列表[6, 6, 6, 2, 2, -6, -6, -6, -2, -2]
说明每次迭代结束时的结果:

Iteration 1, ID = 6:
    nodes = {6: node1}
    counts = [node1(count=1, number=1)]
Iteration 2, ID = 6:
    nodes = {6: node1}
    counts = [node1(count=2, number=1)]
Iteration 3, ID = 6:
    nodes = {6: node1}
    counts = [node1(count=3, number=1)]
Iteration 4, ID = 2:
    nodes = {6: node1, 2: node2}
    counts = [node2(count=1, number=1), node1(count=3, number=1)]
Iteration 5, ID = 2:
    nodes = {6: node1, 2: node2}
    counts = [node2(count=2, number=1), node1(count=3, number=1)]
Iteration 6, ID = -6:
    nodes = {6: node2, 2: node2}
    counts = [node2(count=2, number=2)]
Iteration 7, ID = -6:
    nodes = {6: node1, 2: node2}
    counts = [node1(count=1, number=1), node2(count=2, number=1)]
Iteration 8, ID = -6:
    nodes = {6: node1, 2: node2}
    counts = [node1(count=0, number=1), node2(count=2, number=1)]
Iteration 9, ID = -2:
    nodes = {6: node1, 2: node2}
    counts = [node1(count=0, number=1), node2(count=1, number=1)]
Iteration 10, ID = -2:
    nodes = {6: node1, 2: node1}
    counts = [node1(count=0, number=2)]

这是使用

llist
模块实现双链表的算法:

from llist import dllist
from collections import namedtuple

count_number = namedtuple('count_number', ('count', 'number'))

def max_count_per_update(ids):
    max_counts = []
    nodes = {}
    counts = dllist()
    for id in ids:
        if not (node := nodes.get(key := abs(id))):
            if counts and (first := counts.first).value.count == 0:
                node = first
            else:
                node = counts.appendleft(count_number(0, 1))
        new_count = (current_node := node).value.count + (-1 if id < 0 else 1)
        next_node = node.prev if id < 0 else node.next
        if next_node and (next_value := next_node.value).count == new_count:
            (node := next_node).value = count_number(new_count, next_value.number + 1)
        else:
            insert = counts.insertbefore if id < 0 else counts.insertafter
            node = insert(count_number(new_count, 1), node)
        if new_number := (current_value := current_node.value).number - 1:
            current_node.value = count_number(current_value.count, new_number)
        else:
            counts.remove(current_node)
        nodes[key] = node
        max_counts.append(counts.last.value.count)
    return max_counts

这样:

input_list = [6, 6, 6, 2, 2, -6, -6, -6, -2, -2]
print(max_count_per_update(input_list))

输出:

[1, 2, 3, 3, 3, 2, 2, 2, 1, 0]

演示:https://replit.com/@blhsing1/SereneLimeProfiler#main.py

© www.soinside.com 2019 - 2024. All rights reserved.