mysql如何从偶数行中找到中位数

问题描述 投票:0回答:1

在StackOverflow和其他网站中找到一些解决方案来查找中位数,但所有这些解决方案都适用于以下原则 - 中位数是行数的一半,而另一半则更大。但对于偶数行,中位数是两个中间值的平均值。如何在MySQL中计算它?

mysql
1个回答
1
投票

给出以下架构

CREATE TABLE numbers (
  i INT AUTO_INCREMENT PRIMARY KEY,
  n INT,
  INDEX(n)
);

我们想找到列n的中位数。

ROW_NUMBER()和CTE(需要MySQL 8或MariaDB 10.2)

with sorted as (
    select t.n, row_number() over (order by t.n) as rn
    from numbers t
), cnt as (
    select count(*) as c from numbers
)
select avg(s.n) as median
from sorted s
cross join cnt
where s.rn between floor((c+1)/2) and ceil((c+1)/2);

性能:OK(100K行140ms)

带AUTO_INCREMENT的临时表

drop temporary table if exists tmp;
create temporary table tmp(
    rn int auto_increment primary key,
    n int
) engine=memory;

insert into tmp(n) 
    select n
    from numbers
    order by n;

select avg(n) as median
from tmp
cross join (select count(*) as c from numbers) cnt
where rn between floor((c+1)/2) and ceil((c+1)/2);

性能:OK(100K行110ms)

准备好的声明,计算出LIMIT / OFFSET

set @c = (select count(*) from numbers);
set @limit = 2 - (@c % 2);
set @offset = (@c+1) div 2 - 1;

prepare stmt from '
    select avg(n) as median
    from (
        select n
        from numbers
        limit ? offset ?
    ) sub
';

execute stmt using @limit, @offset;

性能:最佳(100K行50ms)

交叉加入

Using primary key

select avg(n) as median
from (
  select t.n
  from numbers t
  cross join numbers t2
  group by t.i
  having greatest(sum(t2.n < t.n), sum(t2.n > t.n))
      <= (select count(*) from numbers) / 2
) sub

Without primary key

select avg(n) as median
from (
    select t.n
    from numbers t
    cross join numbers t2
    group by t.n
    having greatest(sum(t2.n < t.n), sum(t2.n > t.n)) / sqrt(SUM(t2.n = t.n))
        <= (select count(*) from numbers)/2
) sub

性能:最差 - O(n²)(1K行500ms)

© www.soinside.com 2019 - 2024. All rights reserved.