需要帮助重写此查询

问题描述 投票:1回答:1

我们在生产中有这个查询每天运行它做了很多连接,并且还在蜂巢中使用窗口功能

我们尝试添加一些设置选项,但这没有多大帮助

结构是这样的 -

SELECT
        C.f1, C.f2, A.f2 ...
FROM (
    SELECT * FROM (
        SELECT T1.*, B.atid, B.a_id,
        ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS RANK_
        FROM T1 AS T1
        JOIN T5 ON T1.t_dt = T5.t_dt
        JOIN T2 B ON T1.wtid = B.wtid and T1.b_ts = B.b_ts
        LEFT OUTER JOIN (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
        ON T1.TYP = PV.p_cd
        WHERE T1.state not in ("INVALID")
        AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
        AND ISNULL(PV.p_cd)
    ) T
    WHERE T.rank_ = 1
) A

JOIN (SELECT *, row_number() over (partition by ac_id order by b_ts desc) rank_  
      FROM T4
      WHERE event not in ('CT','UPD')
     ) AS C
  ON A.a_id = C.a_id
AND A.atid = C.ac_id
AND C.rank_ = 1
JOIN T6 ON C.t_dt = T6.t_dt
  • 因为我不能忽略任何表(和连接),我的方法是使用聚合函数max替换窗口函数与另一个连接,但我无法重写它。
  • 此外,我不确定这是否肯定有助于提高绩效,因此任何指导都将对我们有所帮助。
sql hive query-optimization hiveql
1个回答
2
投票

分析函数通常比使用select max的连接执行得更好,因为在分析函数的情况下,您只读取同一个表一次,并且row_number计算由partition by并行化。

尝试重新组合联接和筛选。

加入

LEFT OUTER JOIN (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV
        ON T1.TYP = PV.p_cd

条件ISNULL(PV.p_cd)减少了T1中的某些行。这些条件相同:

WHERE T1.state not in ("INVALID")
        AND T1.evt_name NOT IN ('INACTIVE','DORMANT')

将此连接移动到子查询中,如果它过滤了一个lo,这可能有助于在所有其他连接和row_number()之前减少T1中的数据集:

(select T1.* from T1 
             left join (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV 
                       ON T1.TYP = PV.p_cd 
 where T1.state not in ("INVALID")
        AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
        AND ISNULL(PV.p_cd)
) as T1 

此外,第一个row_number仅在T1和B表上计算:

PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC

考虑在row_number过滤器之后加入T5表,如果此连接很重,并且row_number过滤器正在减少数据集,则再次在子查询中使用过滤器包装row_number并加入使用T5过滤的子查询。

(--filtered by row_number
select * from
(
 SELECT T1.*, B.atid, B.a_id,
        ROW_NUMBER() OVER (PARTITION BY T1.wtid, B.atid ORDER BY T1.b_ts DESC) AS RANK_
  from
    (select T1.* from T1 
                 left join (SELECT p_cd FROM T3 WHERE PV_TY_CD = 'ORIG_CD') PV 
                           ON T1.TYP = PV.p_cd 
     where T1.state not in ("INVALID")
            AND T1.evt_name NOT IN ('INACTIVE','DORMANT')
            AND ISNULL(PV.p_cd)
    ) as T1 JOIN T2 B ON T1.wtid = B.wtid and T1.b_ts = B.b_ts
) T WHERE T.rank_ = 1
) T --filtered
JOIN T5 ON T1.t_dt = T5.t_d  

根据您的数据,这可能会有所帮助。

另请阅读:https://stackoverflow.com/a/51061613/2700344和此:https://stackoverflow.com/a/51061613/2700344

© www.soinside.com 2019 - 2024. All rights reserved.