从直方图计算平均值和百分比?

问题描述 投票:12回答:2

我写了一个定时器,它将测量任何多线程应用程序中的特定代码的性能。在下面的定时器中,它还会在地图上填充x毫秒的调用次数。我将使用这个地图作为我的直方图的一部分来做进一步的分析,比如有多少百分比的调用花了这么多毫秒等等。

public static class StopWatch {

    public static ConcurrentHashMap<Long, Long> histogram = new ConcurrentHashMap<Long, Long>();

    /**
     * Creates an instance of the timer and starts it running.
     */
    public static StopWatch getInstance() {
        return new StopWatch();
    }

    private long m_end = -1;
    private long m_interval = -1;
    private final long m_start;

    private StopWatch() {
        m_start = m_interval = currentTime();
    }

    /**
     * Returns in milliseconds the amount of time that has elapsed since the timer was created. If the
     * <code>stop</code> method has been invoked, then this returns instead the elapsed time between the creation of
     * the timer and the moment when <code>stop</code> was invoked.
     * 
     * @return duration it took
     */
    public long getDuration() {
        long result = 0;

        final long startTime = m_start;
        final long endTime = isStopWatchRunning() ? currentTime() : m_end;

        result = convertNanoToMilliseconds(endTime - startTime);

        boolean done = false;
        while (!done) {
            Long oldValue = histogram.putIfAbsent(result, 1L);
            if (oldValue != null) {
                done = histogram.replace(result, oldValue, oldValue + 1);
            } else {
                done = true;
            }
        }

        return result;
    }

    /**
     * Returns in milliseconds the amount of time that has elapsed since the last invocation of this same method. If
     * this method has not previously been invoked, then it is the amount of time that has elapsed since the timer
     * was created. <strong>Note</strong> that once the <code>stop</code> method has been invoked this will just
     * return zero.
     * 
     * @return interval period
     */
    public long getInterval() {
        long result = 0;

        final long startTime = m_interval;
        final long endTime;

        if (isStopWatchRunning()) {
            endTime = m_interval = currentTime();
        } else {
            endTime = m_end;
        }

        result = convertNanoToMilliseconds(endTime - startTime);

        return result;
    }

    /**
     * Stops the timer from advancing. This has an impact on the values returned by both the
     * <code>getDuration</code> and the <code>getInterval</code> methods.
     */
    public void stop() {
        if (isStopWatchRunning()) {
            m_end = currentTime();
        }
    }

    /**
     * What is the current time in nanoseconds?
     * 
     * @return returns back the current time in nanoseconds
     */
    private long currentTime() {
        return System.nanoTime();
    }

    /**
     * This is used to check whether the timer is alive or not
     * 
     * @return checks whether the timer is running or not
     */
    private boolean isStopWatchRunning() {
        return (m_end <= 0);
    }

    /**
     * This is used to convert NanoSeconds to Milliseconds
     * 
     * @param nanoseconds
     * @return milliseconds value of nanoseconds
     */
    private long convertNanoToMilliseconds(final long nanoseconds) {
        return nanoseconds / 1000000L;
    }
}

例如,我将使用上面的定时器类来衡量我的多线程应用程序中某一代码的性能。

StopWatch timer = StopWatch.getInstance();
//... some code here to measure
timer.getDuration();

现在我的问题是,什么是计算请求的平均数,中位数,第95和99百分位数的最好方法?我的意思是说,我想在我的StopWatch类中添加一些方法,这些方法可以完成所有的计算,比如找到平均数、中位数、第95和第99百分位数。

然后我就可以直接通过使用 StopWatch 例如,我的直方图会是这样的:

我的直方图会是这样的。

关键 --意味着毫秒数

值 - 意味着需要这么多毫秒的调用次数。

java multithreading timer thread-safety concurrenthashmap
2个回答
3
投票

平均数是直接实现的。中位数是第50个百分位数,所以你只需要一个有效的单百分位数方法,并为中位数创建一个实用方法。有 百分位数计算的几种变化但这个函数应该产生与Microsoft Excel PERCENTILE.INC函数相同的结果。

import java.util.Map;
import java.util.SortedSet;
import java.util.concurrent.ConcurrentSkipListSet;

public class HistogramStatistics
{
    public static Double average(final Map<Long, Long> histogram)
    {
        return HistogramStatistics.mean(histogram);
    }

    public static Double mean(final Map<Long, Long> histogram)
    {
        double sum = 0L;

        for (Long value : histogram.keySet())
        {
            sum += (value * histogram.get(value));
        }

        return sum / (double) HistogramStatistics.count(histogram);
    }

    public static Double median(final Map<Long, Long> histogram)
    {
        return HistogramStatistics.percentile(histogram, 0.50d);
    }

    public static Double percentile(final Map<Long, Long> histogram, final double percent)
    {
        if ((percent < 0d) || (percent > 1d))
        {
            throw new IllegalArgumentException("Percentile must be between 0.00 and 1.00.");
        }

        if ((histogram == null) || histogram.isEmpty())
        {
            return null;
        }

        double n = (percent * (HistogramStatistics.count(histogram).doubleValue() - 1d)) + 1d;
        double d = n - Math.floor(n);
        SortedSet<Long> bins = new ConcurrentSkipListSet<Long>(histogram.keySet());
        long observationsBelowBinInclusive = 0L;
        Long lowBin = bins.first();

        Double valuePercentile = null;

        for (Long highBin : bins)
        {
            observationsBelowBinInclusive += histogram.get(highBin);

            if (n <= observationsBelowBinInclusive)
            {
                if ((d == 0f) || (histogram.get(highBin) > 1L))
                {
                    lowBin = highBin;
                }

                valuePercentile = lowBin.doubleValue() + ((highBin - lowBin) * d);

                break;
            }

            lowBin = highBin;
        }

        return valuePercentile;
    }

    public static Long count(final Map<Long, Long> histogram)
    {
        long observations = 0L;

        for (Long value : histogram.keySet())
        {
            observations += histogram.get(value);
        }

        return observations;
    }
}

1
投票

给出一个像下面这样的直方图(频率列表)。

Value | Frequency
------+----------
    1 | 5
    2 | 3
    3 | 1
    4 | 7
    5 | 2
    ..

其中每个 Value 已发生 Frequency 的时间,您可能希望将测量的持续时间四舍五入到一些所需的分辨率,例如10或100毫秒的单位,这样您的地图就不会因为所有可能的延迟值而变得臃肿。

public static double getMean (ConcurrentHashMap<Long,Long> histogram)
{
    double mean = 0;
    double a = 0;
    double b = 0;

    TreeSet<Long> values = histogram.keySet();

    for (Long value : values)
    {
        // a = a + (value x frequency)
        a = a + (value * histogram.get(value));

        // b = b + frequency
        b = b + histogram.get(value);
    }

    // mean = SUM(value x frequency) / SUM(frequency)
    mean = (a / b);

    return mean;
}

0
投票

您可能希望将测量的持续时间四舍五入到一些所需的分辨率,例如10或100毫秒的单位,这样您的地图就不会因为所有可能的延迟值而变得臃肿。

您也可以使用数组代替地图,以获得最坏情况下的O(1)查找和内存定位的优势。

此外,你可以用数组来代替 while (!done) 循环 getDuration(),您可以使用 长加器原子长 应该更快。

至于如何可靠地计算分层直方图上的百分位数,你可以看一看 HBPE 以供参考实现。声明:本人是作者。

© www.soinside.com 2019 - 2024. All rights reserved.