Python从字典[复制]得到N个最大值

Question

这个问题已经在这里有一个答案：

5 maximum values in a python dictionary 5个回答
top values from dictionary 4个回答

假设我们有词典：

items = {'a': 7, 'b': 12, 'c': 9, 'd': 0, 'e': 24, 'f': 10, 'g': 24}

我想其他的字典，包含4个元素与最大值。例如。我期望能获得：

subitems = {'e': 24, 'g': 24, 'b': 12, 'f': 10}

什么将是最Python化和高效（内存消耗，执行速度 - 当F.E.我会用字典元素百万）的方式来做到这一点？发电机，lambda表达式，另一种东西？

Answer 1

heapq.nlargest永远是正确的答案时，问题是：“我如何获得为数不多的最大值从一个巨大的组输入？”它最大限度地减少内存占用和CPU占用率比其它任何东西，你可以在Python做，通过使用更好的堆。例：

import heapq
from operator import itemgetter

items = {'a': 7, 'b': 12, 'c': 9, 'd': 0, 'e': 24, 'f': 10, 'g': 24}

topitems = heapq.nlargest(items.items(), key=itemgetter(1))  # Use .iteritems() on Py2
topitemsasdict = dict(topitems)

sorted和切片的结果时，要求最高的项目数是输入的很大比例，但对于巨大的投入，最大的项目数量较少，heapq.nlargest的节省内存将赢得能赢。

对于CS理论爱好者，heapq.nlargest，尺寸为n的输入，选择k最大值，需要O(n log k)计算和存储k。 sorted其次切片需要O(n log n)计算和存储n。因此，对于1024个输入和4选择的项目，用于nlargest工作是〜1024 * 2计算为4所需的存储; sorted +切片会〜1024×10的1024存储在实际计算中，sorted使用Python的TimSort具有较低的开销比大O符号能够正确传达，并且通常执行比大O符号将表明，这是更好为什么，比如说，选择排名前200的项目出来1024，sorted +切片仍然可以取胜，但nlargest缺少巨大的投入和产出病理退化;可能有时慢，但它通常是慢不了多少，在这里整理可以更快，但它也可以是慢得多。

Answer 2

检查collections.Counter.most_common()方法的源代码。它显示出最佳的解决方案。当然，最好的办法是使用Counter()的{}代替。

def most_common(self, n=None):
    '''List the n most common elements and their counts from the most
    common to the least.  If n is None, then list all element counts.

    >>> Counter('abcdeabcdabcaba').most_common(3)
    [('a', 5), ('b', 4), ('c', 3)]

    '''
    # Emulate Bag.sortedByCount from Smalltalk
    if n is None:
        return sorted(self.iteritems(), key=_itemgetter(1), reverse=True)
    return _heapq.nlargest(n, self.iteritems(), key=_itemgetter(1))

Python从字典[复制]得到N个最大值

问题描述投票：0回答：2

2个回答

最新问题

Python从字典[复制]得到N个最大值

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2