窗口函数错误至少1个组必须仅依赖于输入列

问题描述 投票:0回答:1

我有一个带窗口函数的公用表表达式并不断收到错误消息:

编译语句时出错:FAILED:SemanticException无法将窗口调用分解为组。至少有一个组必须仅依赖于输入列。还要检查循环依赖性。基础错误:org.apache.hadoop.hive.ql.parse.SemanticException:第82行:6 CTE pro_orders定义中的列引用'gcr_amt'无效[选择o.shopper_id为pro_shopper_id,date_format(o.order_date,'YYYYMM')作为ym_order,sum(o.gcr_amt)为total_gcr,sum(o.product_pnl_new_renewal_name ='New Purchase',然后o.gcr_amt结束时的情况)为new_gcr,sum(o.gcr_amt)over(o之前的o.shopper_id行之间的行)来自dp_enterprise.uds_order的12months_direct_gcr以及cs.pro_shopper_id = o.shopper_id和cs.year_month = date_format(o.order_date,'YYYYMM')的内部连接combined_shopper_level_data cs,其中o.exclude_reason_desc是o.shopper_id的Null组, o.order_date]用作第83行:5的po

我的cte看起来像这样:

pro_orders as (
  select  o.shopper_id as pro_shopper_id,
          date_format(o.order_date, 'YYYYMM') as ym_order,
          sum(o.gcr_amt) as total_gcr,
          sum(case when o.product_pnl_new_renewal_name = 'New Purchase' then o.gcr_amt end) as new_gcr,
          sum(o.gcr_amt) over (partition by o.shopper_id, cs.year_month order by cs.year_month desc rows between 12 preceding and 0 following) as 12months_direct_gcr
  from dp_enterprise.uds_order o
  right join combined_shopper_level_data cs on cs.pro_shopper_id = o.shopper_id and cs.year_month = date_format(o.order_date, 'YYYYMM')
  group by o.shopper_id, o.order_date
),

我不经常使用窗口函数,也许我的语法是关闭的。在英语中,我要做的是获得12个月的公制“gcr”总计。

因此,在201901年的一年中,有一个shopper_id 123abc的行,我想将前11个月加上当前行gcr的总和为12个月。不确定我的窗口功能是否正确设置?

引用的year_month格式为YYYYMM,例如: 201901。

根据我的目标,我的窗口功能是否设置正确?

我该如何克服此错误消息?

编辑:仍然使用以下CTE收到此错误消息:

pro_orders as (
  select  o.shopper_id as pro_shopper_id,
          cs.year_month,
          sum(case when date_format(o.order_date, 'YYYYMM') = cs.year_month then o.gcr_amt else 0 end) as total_gcr,
          sum(case when date_format(o.order_date, 'YYYYMM') = cs.year_month and o.product_pnl_new_renewal_name = 'New Purchase' then o.gcr_amt else 0 end) as new_gcr,
          sum(sum(o.gcr_amt)) over  (partition by o.shopper_id 
                                order by cs.year_month desc 
                                rows between 12 preceding and 0 following) 
                                as 12months_direct_gcr
  from combined_shopper_level_data cs
  left join dp_enterprise.uds_order o on o.shopper_id = cs.pro_shopper_id
  where o.exclude_reason_desc is Null
  group by o.shopper_id, cs.year_month
),

结果出现类似的错误消息:

编译语句时出错:FAILED:SemanticException无法将窗口调用分解为组。至少有一个组必须仅依赖于输入列。还要检查循环依赖性。基础错误:org.apache.hadoop.hive.ql.parse.SemanticException:第83:10行CTE pro_orders定义中的列引用'gcr_amt'无效[选择o.shopper_id为pro_shopper_id,cs.year_month,sum(date_format的情况) o.order_date,'YYYYMM')= cs.year_month然后o.gcr_amt else 0 end)as total_gcr,sum(date_format(o.order_date,'YYYYMM')= cs.year_month和o.product_pnl_new_renewal_name ='New Purchase'的情况然后o.gcr_amt其他0结束)作为new_gcr,sum(sum(o.gcr_amt))over(由o.shopper_id顺序划分cs.year_month desc行在12前面和后面的0之间)作为12months_direct_gcr来自combined_shopper_level_data cs left join dp_enterprise。 o.shopper_id = cs.pro_shopper_id上的uds_order o其中o.exclude_reason_desc是o.shopper_id的空组,cs.year_month]用作87号线的po:5

sql hiveql
1个回答
1
投票

你有一个聚合查询,所以窗口函数看起来有点搞笑。基本想法是这样的:

sum(sum(o.gcr_amt)) over (partition by o.shopper_id, cs.year_month
                          order by cs.year_month desc
                          rows between 12 preceding and 0 following
                         ) as 12months_direct_gcr

这仍然行不通。首先,你有order bypartition by的价值。其次,它不在group by

假设每个月都有一个值,那么您可以使用:

sum(sum(o.gcr_amt)) over (partition by o.shopper_id
                          order by cs.year_month desc
                          rows between 12 preceding and 0 following
                         ) as 12months_direct_gcr

并在cs.year_month中使用group by(可能需要调整查询的其他部分。

为了便于阅读,我还建议您使用left join而不是right join。对于我(以及大多数人)来说,认真地说“在我刚刚阅读的第一个表中保留所有行”而不是“将所有行保留在from末尾的某些表格中”条款”。

编辑:

我认为完整的查询是:

with pro_orders as (
      select o.shopper_id as pro_shopper_id,
             cs.year_month,
             sum(coalesce(o.gcr_amt, 0)) as total_gcr,
             sum(case when o.product_pnl_new_renewal_name = 'New Purchase' then o.gcr_amt else 0 end) as new_gcr,
             sum(sum(o.gcr_amt)) over (partition by o.shopper_id 
                                       order by cs.year_month desc 
                                       rows between 12 preceding and 0 following
                                      ) as 12months_direct_gcr
      from combined_shopper_level_data cs left join
           dp_enterprise.uds_order o
           on o.shopper_id = cs.pro_shopper_id and
              date_format(o.order_date, 'YYYYMM') = cs.year_month and
              o.exclude_reason_desc is Null
      group by o.shopper_id, cs.year_month
     ),

在聚合查询中使用窗口函数时,Hive可能存在限制(这会让我感到惊讶,因为这些是单独处理的)。我找不到具体的参考。如果是这样,只需使用子查询:

with pro_orders as (
      select pro_shopper_id, year_month, total_gcr, new_gcr
             sum(sum(total_gcr_amt)) over (partition by pro_shopper_id
                                           order by year_month desc 
                                           rows between 12 preceding and 0 following
                                          ) as 12months_direct_gcr
      from (select o.shopper_id as pro_shopper_id,
                   cs.year_month,
                   sum(coalesce(o.gcr_amt, 0)) as total_gcr,
                   sum(case when o.product_pnl_new_renewal_name = 'New Purchase' then o.gcr_amt else 0 end) as new_gcr,
          from combined_shopper_level_data cs left join
               dp_enterprise.uds_order o
               on o.shopper_id = cs.pro_shopper_id and
                  date_format(o.order_date, 'YYYYMM') = cs.year_month and
                  o.exclude_reason_desc is Null
          group by o.shopper_id, cs.year_month
         ) ps
     ),
© www.soinside.com 2019 - 2024. All rights reserved.