配置单元:如何将总行数输出为变量

问题描述 投票:0回答:1

我有一个要使用以下代码进行重复数据删除的数据集:

select session_id, sol_id, id, session_context_code,
    from (
        select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id, date) as rn,
        substr(case_id,2,9) as id

        from df.t1_data
         )undup
        where undup.rn =1 
        order by session_id, sol_id, date

我想添加一个变量来存储dedup之后的总行数,并尝试使用count(*):

select session_id, sol_id, id, session_context_code, count(*) as total
    from (
        select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
        substr(case_id,2,9) as id

        from df.t1_data
         )undup
        where undup.rn =1 
        order by session_id, sol_id, date

我收到的错误:

错误:执行错误:org.apache.hive.service.cli.HiveSQLException:编译语句时出错:FAILED:SemanticException[错误10025]:第1行:44表达式不在GROUP BY键'session_id'中]

我只想将计数输出为一个变量,该变量在对行号进行重复数据删除之后,按session_id和sol_id对所有不同的记录进行计数。如何将其合并到代码中?

hadoop hive hql hiveql cloudera
1个回答
0
投票

具有COUNT(*)的Hive查询以及SELECT子句中的列,应在查询末尾使用GROUP BY对这些列进行分组。

一些样品:

SELECT COUNT(*) FROM employees;

SELECT id, name, COUNT(*) FROM employees GROUP BY id, name;

在您的问题中,查询应如下所示,

select session_id, sol_id, id, session_context_code, count(*) as total
    from (
        select *, ROW_NUMBER() OVER (PARTITION BY session_id, sol_id,date) as rn,
        substr(case_id,2,9) as id

        from df.t1_data
         )undup
        where undup.rn =1 
        order by session_id, sol_id, date
GROUP BY session_id, sol_id, id, session_context_code

您可以阅读更多HERE

希望这会有所帮助!

© www.soinside.com 2019 - 2024. All rights reserved.