Hive：找到前20％的记录

Question

我有一些数据： -

我需要找到前20％的价格。

预期产量： -

ID  PRICE
5   320
6   300

Answer 1

你可以不加入连接。使用分析函数计算max(price)，取80％，然后使用过滤价格> 80％：

with your_data as ( --this is your data
select stack(10,
1 ,  100,
2 ,  200,
3 ,  120,
4 ,  130,
5 ,  320,
6 ,  300,
7 ,  200,
8 ,  100,
9 ,  120,
10,  250) as (ID,  PRICE)
)

select id, price 
from
(
select d.*, max(price) over()*0.8 as pct_80 from your_data d
)s where price>pct_80

结果：

OK
id      price
6       300
5       320

使用您的表而不是WITH子查询，必要时按ID添加订单。

Answer 2

以下是查询 -

with top_20 as (
  select 
    max(price)*0.8 as price1 
  from 
    <tableName>
)
select * from <tableName> t1 , top_20 t2 where t1.price > t2.price1;

select 
 name, 
 price 
from 
 (select 
    name, 
    price, 
    max(price)*0.8 over (order by price) as top_20 
 from <tableName>
 ) t1 
where 
 t1.price > t1.top_20;

以下查询将不适用于配置单元 -

select * from <tableName> where price > (select max(salary)*0.8 from <tableName>)

select * from <tableName> t1 where exists (select salary from <tablename> t2 where t1.salary > t2.salary*0.8)

原因 - Hive不支持具有相同条件的where子句中的子查询，它仅支持IN，NOT IN，EXISTS和NOT EXISTS。

即使存在Exists和NOT Exists，它也仅支持Equijoin，请参阅https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries#LanguageManualSubQueries-SubqueriesintheWHEREClause以获取更多详细信息

希望这可以帮助。

Answer 3

这是一种你无需使用join就可以做到这一点的方法。

Select id,price from (select id,price, row_number() over(order by price desc) r1,count(*) over()*(20/100) ct from table_name)final where r1<=ct ;

Hive：找到前20％的记录

问题描述投票：0回答：3

3个回答

最新问题

Hive：找到前20％的记录

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3