如何使用SQL从字典查找值中获取平均值?

问题描述 投票:0回答:2

我的数据框看起来像这样:

id            value
a       0:3,1:0,2:0,3:4
a       0:0,1:0,2:2,3:0
a       0:0,1:5,2:4,3:0

我想编写一个查询来获取列值中键的平均值?

例如,对于

0:3,1:0,2:0,3:4
,它必须是
(0+0+0+3+3+3+3)/7 = 1.71

对于

0:0,1:0,2:2,3:0
,它必须是
(2+2)/2=2

对于

0:0,1:5,2:4,3:0
,它必须是
(1+1+1+1+1+2+2+2+2)/9 = 1.44

所以想要的结果是:

id            value
a              1.71
a              2.00
a              1.44

如何做到这一点?有没有sql函数可以得到这个结果?

mysql sql presto
2个回答
0
投票

看到这个DBFIDDLE

代码:

CREATE  PROCEDURE `avg_dict`(s varchar(100))
BEGIN
  SET @result = CONCAT('SELECT (', replace(replace(s, ":","*"),",","+"), ')/(',regexp_replace(s,",?[0-9]:","+"),')');
  PREPARE stmt FROM @result;
  EXECUTE stmt  ;
  DEALLOCATE PREPARE stmt;
END

结果:

stmt 输出
CALL avg_dict("0:3,1:0,2:0,3:4");
1.1743
CALL avg_dict("0:0,1:0,2:2,3:0");
2.0000
CALL avg_dict("0:0,1:5,2:4,3:0");
1.4444

0
投票

通过

split
transform
repeat
的某种组合,您可以实现您的目标:

WITH dataset(id, value) AS (
    values ('a', '0:3,1:0,2:0,3:4'),
        ('a', '0:0,1:0,2:2,3:0'),
        ('a', '0:0,1:5,2:4,3:0')
)

SELECT id,
    reduce(arr, 0.0, (s, x)->s + x, s->s) / cardinality(arr)
FROM(
        SELECT *,
            flatten(
                transform(
                    transform(
                        split(value, ','),
                        s->split(s, ':')
                    ),
                    arr->repeat(
                        cast(arr [ 1 ] as INTEGER),
                        cast(arr [ 2 ] as INTEGER)
                    )
                )
            ) as arr
        FROM dataset
    )

输出:

id _col1
a 1.7142857142857142
a 2.0
a 1.4444444444444444

注:

外部选择可以用

array_average
代替,但我使用了选择,因为Athena的Presto版本不支持它。

更新

另一个性能更高的版本:

SELECT id,
    reduce(
        arr,
        CAST(ROW(0.0, 0) AS ROW(sum DOUBLE, count INTEGER)),
        (s, r)->CAST(
            ROW(r.num * r.count + s.sum, s.count + r.count) AS ROW(sum DOUBLE, count INTEGER)
        ),
        s->IF(s.count = 0, NULL, s.sum / s.count)
    )
FROM(
        SELECT *,
            transform(
                split(value, ','),
                s->CAST(
                    ROW(
                        CAST(split(s, ':') [ 1 ] AS DOUBLE),
                        (CAST(split(s, ':') [ 2 ] AS INTEGER))
                    ) AS ROW(num DOUBLE, count INTEGER)
                )
            ) as arr
        FROM dataset
    )
© www.soinside.com 2019 - 2024. All rights reserved.