跟踪扩展数组的中位数

问题描述 投票:0回答:5

面试问题:

编辑如下 给你一个数组。您可以从中创建 2 个堆,一个是最小堆,另一个是最大堆。现在使用这 2 个提供的堆在 O(nlog n) 时间内找到数组的中位数。

更正问题 数字是随机生成的并存储到(扩展)数组中。您将如何跟踪中位数?

解决方案 这个问题可以使用 2 个堆来解决,并且中位数始终可以在 O(1) 时间内访问。

algorithm heap complexity-theory
5个回答
13
投票

以下是如何使用这两个堆。请注意,我假设您不知道元素的数量,这就是为什么我们必须弹出,直到我们从最小堆中弹出的内容大于或等于从最大堆中弹出的内容为止。请注意,我们返回平均值,因为在像

{1, 2, 3, 4}
这样的集合的情况下,中位数实际上是
2.5
(两个“中间”值的平均值)。我假设
double
作为值类型,但这显然可以是任何东西。这里:

double min = minheap.pop();
double max = maxheap.pop();
while(min < max) {
    min = minheap.pop();
    max = maxheap.pop();
}

return (min + max) / 2;

由于弹出是

O(log n)
并且我们必须弹出
O(n / 2)
值,所以这是
O(n log n)


6
投票

Java 中的一个有效实现,使用 2 个堆,O(n log n)。在任何时候,我都会保持两个堆的大小平衡(即,如果我们输入 n 个元素使得 n%2==1,它们最多相差 1)。获取中位数的时间复杂度为 O(1)。添加新元素的时间复杂度为 O(log n)。

public class MedianOfStream {

    private int count;
    private PriorityQueue<Integer> highs, lows;

    public MedianOfStream() {
        highs = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
            @Override
            public int compare(Integer arg0, Integer arg1) {
                return arg0.compareTo(arg1);
            }
        });
        lows = new PriorityQueue<Integer>(11, new Comparator<Integer>() {
            @Override
            public int compare(Integer arg0, Integer arg1) {
                return arg1.compareTo(arg0);
            }
        });
    }

    private int getMedian() {
        if (count == 0)
            return 0;
        if (lows.size() == highs.size()) {
            return (lows.peek() + highs.peek()) / 2;
        } else if (lows.size() < highs.size()) {
            return highs.peek();
        }
        return lows.peek();
    }

    private void swap(){
        int h = highs.poll();
        int l = lows.poll();
        highs.add(l);
        lows.add(h);
    }

    public int updateMedian(int n) {
        count++;

        if (count == 1)
            lows.add(n);

        else if (count==2) {
            highs.add(n);
            if(highs.peek()<lows.peek()) {
                swap(); // O(log n)
            }
        }

        else {
            if (n > highs.peek()) {
                lows.add(highs.poll()); // O(log n)
                highs.add(n); // O(log n)
            } else {
                highs.add(lows.poll()); // O(log n)
                lows.add(n); // O(log n)
            }
            if(highs.peek()<lows.peek()) {
                swap(); // O(log n)
            }
        }

        // if we added an even # of items,
        // the heaps must be exactly the same size,
        // otherwise we tolerate a 1-off difference

        if (Math.abs(lows.size() - highs.size()) > (count % 2)) {
            if (lows.size() < highs.size()) {
                lows.add(highs.poll()); // O(log n)
            } else {
                highs.add(lows.poll()); // O(log n)
            }
        }

        return getMedian(); // O(1)
    }
}

3
投票

从堆中弹出是一个 O(log N) 操作,因此您可以通过从其中一个堆中弹出一半元素并获取最后弹出的值来实现 O(N log N)(您必须处理边缘情况) 。但这并没有利用其他堆。

使用选择算法可以实现 O(N),但常数因子非常高。如果您已经有堆,前一个建议可能会更好。


0
投票

这是它的 python 实现。我测试了使用 Random.org 生成的一些示例,并在 median finder 上测试了它们,其中大多数似乎都有效。

import heapq

def medianFinder(arr):
    minHeap = []
    maxHeap = []
    def handleMedianFinder(num: int):
        if not len(maxHeap):
            heapq.heappush(maxHeap, -num)
            return -maxHeap[0]
        elif not len(minHeap):
            heapq.heappush(minHeap, num)
            return (minHeap[0]-maxHeap[0])/2
        if num < minHeap[0]:
            if len(maxHeap) > len(minHeap):
                oldMedian = -heapq.heappop(maxHeap)
                heapq.heappush(minHeap, oldMedian)
                heapq.heappush(maxHeap, -num)
                return (minHeap[0]-maxHeap[0])/2 if (len(minHeap) + len(maxHeap))%2 == 0 else minHeap[0]
            heapq.heappush(maxHeap, -num)
        elif num > -maxHeap[0]:
            if len(maxHeap) < len(minHeap):
                oldMedian = heapq.heappop(minHeap)
                heapq.heappush(maxHeap, -oldMedian)
                heapq.heappush(minHeap, num)
                return (minHeap[0]-maxHeap[0])/2 if (len(minHeap) + len(maxHeap))%2 == 0 else -maxHeap[0]
            heapq.heappush(minHeap, num)
        else: # between the medians (new medians)
            if len(maxHeap) < len(minHeap):
                oldMedian = heapq.heappop(minHeap)
                heapq.heappush(maxHeap, -oldMedian)
                heapq.heappush(minHeap, num)
            else:
                oldMedian = -heapq.heappop(maxHeap)
                heapq.heappush(minHeap, oldMedian)
                heapq.heappush(maxHeap, -num)
        if len(minHeap) < len(maxHeap):
            return minHeap[0]
        elif len(maxHeap) < len(minHeap):
            return -maxHeap[0]
        return (minHeap[0]-maxHeap[0])/2
    for num in arr:
        # print('maxHeap => ', maxHeap)
        # print('minHeap => ', minHeap)
        print(handleMedianFinder(num))

        
arr1 = [11, 18, 16, 12, 14, 4, 15, 10, 19, 20]
medianFinder(arr1)

-1
投票

使用两个堆的 JavaScript 解决方案:

function addNewNumber(minHeap, maxHeap, randomNumber) {
  if (maxHeap.size() === minHeap.size()) {
    if (minHeap.peek() && randomNumber > minHeap.peek()) {
      maxHeap.insert(minHeap.remove());
      minHeap.insert(randomNumber);
    } else {
      maxHeap.insert(randomNumber);
    }
  } else {
    if (randomNumber < maxHeap.peek()) {
      minHeap.insert(maxHeap.remove());
      maxHeap.insert(randomNumber);
    } else {
      minHeap.insert(randomNumber);
    }
  }
}

function getMedian(minHeap, maxHeap) {
  if (!maxHeap.size()) {
    return 0;
  }
  if (minHeap.size() === maxHeap.size()) {
    return (minHeap.peek() + maxHeap.peek()) / 2;
  } else {
    return maxHeap.peek();
  }
}
© www.soinside.com 2019 - 2024. All rights reserved.