我有这样的表结构
|---------------------|----------|-----------|
| col_1 | col_2 | col_3 |
|---------------------|----------|-----------|
| 2018-01-15 17:56 | A | 3 |
|---------------------|----------|-----------|
| 2018-01-15 17:56 | A | 2 |
|---------------------|----------|-----------|
| 2018-10-23 23:43 | B | True |
|---------------------|----------|-----------|
| 2018-10-23 23:43 | B | False |
|---------------------|----------|-----------|
| 2018-10-23 23:43 | A | 3 |
|---------------------|----------|-----------|
| 2018-10-23 23:43 | B | True |
|---------------------|----------|-----------|
我想按col_1
分组,如果col_3
为A,则取col_2
的平均值,如果col_3
为B,则取col_2
的频繁值。期望的结果是
|---------------------|----------|-----------|
| col_1 | A | B |
|---------------------|----------|-----------|
| 2018-01-15 17:56 | 2.5 | Null |
|---------------------|----------|-----------|
| 2018-10-23 23:43 | 3 | True |
|---------------------|----------|-----------|
col_2
为B时没有频率函数,我知道我可以做这样的事情
select col_1,
avg(case when col_2='A' then col_3 end) as A
from my_table
group by col_1
col_2
为B时如何添加频率功能?
使用分析功能,请参见代码中的注释:
with my_table as (
select stack(6,
'2018-01-15 17:56','A', '3' ,
'2018-01-15 17:56','A', '2' ,
'2018-10-23 23:43','B', 'True' ,
'2018-10-23 23:43','B', 'False',
'2018-10-23 23:43','A', '3' ,
'2018-10-23 23:43','B', 'True' ) as (col_1 , col_2, col_3)
)
select col_1, --final aggregation by col_1
max(avg) as A,
max(most_frequent) as B
from(
select col_1, col_2, col_3, cnt, --calculate avg and most_frequent
case when col_2='A' then avg(col_3) over(partition by col_1, col_2) else null end as avg,
case when col_2='B' then first_value(col_3) over(partition by col_1, col_2 order by cnt desc) else null end as most_frequent
from
(
select col_1, col_2, col_3, --calculate count
case when col_2='B' then count(*) over(partition by col_1, col_2, col_3) else null end as cnt
from my_table
)s
)s
group by col_1
;
结果:
col_1 a b
2018-01-15 17:56 2.5 NULL
2018-10-23 23:43 3.0 True