Redshift SQL查询-优化

问题描述 投票:0回答:1

我有一个查询需要在Redshift中执行超过15分钟。使用超时时间为15分钟的AWS Lambda触发此查询。因此,我想检查是否有一种方法可以优化查询以使其快速给出结果。

这是我的SQL查询:

 insert into
  test.qa_locked
select
  '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481',
  'ABC-013505',
  'ABC-013505-2-2020',
  contact_id,
  cast(TIMEOFDAY() as timestamp)
from
  (
    select
      contact_id
    from
      (
        select
          *
        from
          (
            select
              '1759' as de_id,
              xa.contact_id,
              xa.email_id,
              xa.employee_profile_link,
              xa.phone_number,
              xa.phone_line,
              xa.first_name,
              xa.last_name,
              xa.title,
              xa.primary_function,
              xa.secondary_function,
              xa.role,
              xa.e_domain,
              xa.flc,
              xa.fln,
              xa.address,
              xa.city,
              xa.state,
              xa.country,
              xa.zip_code,
              ya.account_id,
              xa.is_contact_suppressed,
              xa.is_email_suppressed,
              xa.email_suppression_lob,
              xa.is_email_soft_bounce,
              xa.is_tele_suppressed,
              xa.tele_suppression_lob,
              xa.active_type,
              xa.is_sv1_verified,
              xa.last_sv1_verified,
              xa.is_email_verified,
              xa.last_email_verified,
              xa.is_tele_verified,
              xa.last_tele_verified,
              ya.company_profile_link,
              ya.company_name,
              ya.website,
              ya.employees,
              ya.employee_range,
              ya.revenue,
              ya.revenue_range,
              ya.primary_industry,
              ya.sub_industry,
              ya.sic_code,
              ya.nic_cide,
              ya.is_company_suppressed,
              ya.is_sv2_verified,
              ya.last_sv2_verified,
              rank() over (
                partition by lower(xa.e_domain)
                order by
                  xa.contact_id,
                  lower(xa.e_domain)
              ) contact_cnt
            from
              contacts xa
              left join accounts ya on xa.account_id = ya.account_id
            where
              xa.lob = 'ABC'
              and xa.is_contact_suppressed = 0
              and (
                UPPER(xa.email_suppression_lob) <> 'ABC'
                or UPPER(xa.email_suppression_lob) <> 'BOTH'
                or UPPER(xa.email_suppression_lob) is null
              )
              and xa.is_email_verified = 1
              and xa.is_email_suppressed = 0
              and (
                (
                  lower(xa.primary_function) in (
                    select
                      lower(param_val)
                    from
                      ce_campaign_spec_tb
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'primary_function'
                      and relation_id = 4
                  )
                  and lower(xa.role) in (
                    select
                      lower(param_val)
                    from
                      ce_campaign_spec_tb
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'role'
                      and relation_id = 4
                  )
                  and lower(xa.title) in (
                    select
                      lower(title)
                    from
                      contacts con
                      inner join ce_campaign_spec_tb camp on lower(con.title) ilike '%' || trim(
                        both ' '
                        from
                          camp.param_val
                      ) || '%'
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'title'
                      and relation_id = 4
                  )
                )
                or (
                  lower(xa.primary_function) in (
                    select
                      lower(param_val)
                    from
                      ce_campaign_spec_tb
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'primary_function'
                      and relation_id = 2
                  )
                  and lower(xa.role) in (
                    select
                      lower(param_val)
                    from
                      ce_campaign_spec_tb
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'role'
                      and relation_id = 2
                  )
                  and lower(xa.title) in (
                    select
                      lower(title)
                    from
                      contacts con
                      inner join ce_campaign_spec_tb camp on lower(con.title) ilike '%' || trim(
                        both ' '
                        from
                          camp.param_val
                      ) || '%'
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'title'
                      and relation_id = 2
                  )
                )
                or (
                  lower(xa.primary_function) in (
                    select
                      lower(param_val)
                    from
                      ce_campaign_spec_tb
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'primary_function'
                      and relation_id = 1
                  )
                  and lower(xa.role) in (
                    select
                      lower(param_val)
                    from
                      ce_campaign_spec_tb
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'role'
                      and relation_id = 1
                  )
                  and lower(xa.title) in (
                    select
                      lower(title)
                    from
                      contacts con
                      inner join ce_campaign_spec_tb camp on lower(con.title) ilike '%' || trim(
                        both ' '
                        from
                          camp.param_val
                      ) || '%'
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'title'
                      and relation_id = 1
                  )
                )
                or (
                  lower(xa.primary_function) in (
                    select
                      lower(param_val)
                    from
                      ce_campaign_spec_tb
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'primary_function'
                      and relation_id = 3
                  )
                  and lower(xa.role) in (
                    select
                      lower(param_val)
                    from
                      ce_campaign_spec_tb
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'role'
                      and relation_id = 3
                  )
                  and lower(xa.title) in (
                    select
                      lower(title)
                    from
                      contacts con
                      inner join ce_campaign_spec_tb camp on lower(con.title) ilike '%' || trim(
                        both ' '
                        from
                          camp.param_val
                      ) || '%'
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'title'
                      and relation_id = 3
                  )
                )
                or (
                  lower(xa.primary_function) in (
                    select
                      lower(param_val)
                    from
                      ce_campaign_spec_tb
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'primary_function'
                      and relation_id = 5
                  )
                  and lower(xa.role) in (
                    select
                      lower(param_val)
                    from
                      ce_campaign_spec_tb
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'role'
                      and relation_id = 5
                  )
                  and lower(xa.title) in (
                    select
                      lower(title)
                    from
                      contacts con
                      inner join ce_campaign_spec_tb camp on lower(con.title) ilike '%' || trim(
                        both ' '
                        from
                          camp.param_val
                      ) || '%'
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and param = 'title'
                      and relation_id = 5
                  )
                )
              )
              and lower(ya.sic_code) NOT IN (
                select
                  distinct lower(sic_code)
                from
                  accounts con
                  inner join campaign_exclusion_list cam on lower(con.sic_code) ilike '%' || cam.exclusion_value || '%'
                where
                  cam.job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                  and UPPER(exclusion_type) = 'SIC'
                  and lower(exclusion_value) not in (
                    select
                      distinct lower(inclusion_value)
                    from
                      campaign_inclusion_list
                    where
                      job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                      and UPPER(inclusion_type) = 'SIC'
                  )
              )
              and xa.contact_id not in (
                select
                  contact_id
                from
                  bh_leads
                where
                  (CURRENT_DATE - creation_date :: date) <= 60
                  and UPPER(LOB) = 'ABC'
                  and agency_id = '1002'
              )
              and xa.contact_id not in (
                select
                  contact_id
                from
                  bh_leads
                where
                  (CURRENT_DATE - creation_date :: date) <= 60
                  and UPPER(LOB) = 'ABC'
                  and sponsor_id = '8306'
              )
              AND email_id NOT IN (
                select
                  email_id
                from
                  bh_email_open_clicks
                where
                  sf_oms_campaign_id = 'ABC-013505-2-2020'
              )
              AND contact_id NOT IN (
                select
                  contact_id
                from
                  campaign_extraction_history
                where
                  sf_oms_campaign_id = 'ABC-013505-2-2020'
                  and sf_campaign_id = 'ABC-013505'
                  and (CURRENT_DATE - creation_date :: date) < 1
                  and channel = 'BOTH'
                  and (
                    UPPER(STATUS) = 'EXTRACTED'
                    OR UPPER(STATUS) = 'LAUNCHED'
                    OR UPPER(STATUS) = 'CONFIRMED'
                  )
              )
              AND contact_id NOT IN (
                select
                  contact_id
                from
                  campaign_extraction_history
                where
                  creation_date :: date = CURRENT_DATE
                  and channel = 'BOTH'
                  and (
                    UPPER(STATUS) = 'EXTRACTED'
                    OR UPPER(STATUS) = 'LAUNCHED'
                    OR UPPER(STATUS) = 'CONFIRMED'
                  )
                group by
                  contact_id
                having
                  count(*) > 10
              )
              AND contact_id NOT IN (
                select
                  contact_id
                from
                  campaign_extraction_history
                where
                  sf_campaign_id = 'ABC-013505'
                  and channel = 'BOTH'
                  and (
                    UPPER(STATUS) = 'EXTRACTED'
                    OR UPPER(STATUS) = 'LAUNCHED'
                    OR UPPER(STATUS) = 'CONFIRMED'
                  )
                group by
                  contact_id
                having
                  count(*) >= 3
              )
              and e_domain not in (
                select
                  "domain"
                from
                  bh_leads
                where
                  sf_campaign_id = 'ABC-013505'
                group by
                  "domain"
                having
                  count(*) >= 1
              )
              AND contact_id NOT IN (
                select
                  contact_id
                from
                  bh_leads
                where
                  agency_id = 1002
                  and sf_campaign_id = 'ABC-013505'
                  and (CURRENT_DATE - creation_date :: date) <= 180
              )
              and flc not in (
                select
                  distinct flc
                from
                  contacts
                where
                  is_tele_suppressed = 1
                  and (
                    tele_suppression_lob = 'ABC'
                    or tele_suppression_lob = 'BOTH'
                  )
              )
              and email_id not in (
                select
                  distinct email_id
                from
                  contacts
                where
                  is_email_suppressed = 1
              )
              and (
                xa.e_domain not ilike '%.gov%'
                and xa.e_domain not ilike '%.mil%'
              )
              and (
                xa.email_id not ilike '%@%.gov%'
                and xa.email_id not ilike '%@%.mil%'
              )
              and contact_id not in (
                select
                  contact_id
                from
                  test.qa_locked
              )
          )
        where
          contact_cnt <= 1
      )
  )

这里是计划:

XN Subquery Scan "*SELECT*" (cost=1000028198481.69..1000028198481.75 rows=1 width=218)
     ->  XN Subquery Scan derived_table1 (cost=1000028198481.69..1000028198481.73 rows=1 width=210)
         ->  XN Window (cost=1000028198481.69..1000028198481.71 rows=1 width=56)
             ->  XN Sort (cost=1000028198481.69..1000028198481.70 rows=1 width=56)
                 ->  XN Network (cost=1645148.05..28198481.68 rows=1 width=56)
                     ->  XN Hash NOT IN Join DS_DIST_OUTER (cost=1645148.05..28198481.68 rows=1 width=56)
                         ->  XN Hash NOT IN Join DS_DIST_INNER (cost=1645147.76..28091814.71 rows=1 width=56)
                             ->  XN Hash NOT IN Join DS_DIST_INNER (cost=1645147.09..7491814.01 rows=1 width=56)
                                 ->  XN Hash NOT IN Join DS_DIST_INNER (cost=1645146.68..6805146.91 rows=1 width=56)
                                     ->  XN Hash NOT IN Join DS_DIST_INNER (cost=1645146.16..6438479.71 rows=1 width=56)
                                         ->  XN Hash NOT IN Join DS_DIST_NONE (cost=1645145.65..6071812.51 rows=1 width=56)
                                             ->  XN Hash NOT IN Join DS_DIST_NONE (cost=1645145.29..6071812.13 rows=1 width=56)
                                                 ->  XN Hash NOT IN Join DS_DIST_BOTH (cost=1645144.96..6071811.77 rows=1 width=56)
                                                     ->  XN Hash NOT IN Join DS_DIST_NONE (cost=1645144.50..5598477.96 rows=1 width=56)
                                                         ->  XN Hash NOT IN Join DS_DIST_BOTH (cost=1645144.47..5598477.91 rows=1 width=84)
                                                             ->  XN Hash NOT IN Join DS_DIST_OUTER (cost=1645142.59..5078476.00 rows=1 width=84)
                                                                 ->  XN Hash NOT IN Join DS_BCAST_INNER (cost=1645142.57..4065142.63 rows=1 width=600)
                                                                     ->  XN Hash Left Join DS_DIST_BOTH (cost=1201145.21..3221145.24 rows=1 width=1116)
                                                                         ->  XN Seq Scan on contacts xa (cost=1201145.21..1201145.21 rows=1 width=640)
                                                                         ->  XN Hash (cost=0.00..0.00 rows=1 width=556)
                                                                             ->  XN Seq Scan on accounts ya (cost=0.00..0.00 rows=1 width=556)
                                                                     ->  XN Hash (cost=443997.35..443997.35 rows=1 width=32)
                                                                         ->  XN Subquery Scan "IN_subquery" (cost=23989.76..443997.35 rows=1 width=32)
                                                                             ->  XN Unique (cost=23989.76..443997.34 rows=1 width=516)
                                                                                 ->  XN Nested Loop DS_BCAST_INNER (cost=23989.76..443997.34 rows=1 width=516)
                                                                                     ->  XN Seq Scan on accounts con (cost=0.00..0.00 rows=1 width=516)
                                                                                     ->  XN Hash NOT IN Join DS_DIST_OUTER (cost=23989.76..83997.32 rows=1 width=26)
                                                                                         ->  XN Seq Scan on campaign_exclusion_list cam (cost=0.00..7.53 rows=1 width=26)
                                                                                         ->  XN Hash (cost=23989.75..23989.75 rows=1 width=32)
                                                                                             ->  XN Subquery Scan "IN_subquery" (cost=0.00..23989.75 rows=1 width=32)
                                                                                                 ->  XN Unique (cost=0.00..23989.74 rows=1 width=18)
                                                                                                     ->  XN Seq Scan on campaign_inclusion_list (cost=0.00..23989.74 rows=1 width=18)
                                                                 ->  XN Hash (cost=0.01..0.01 rows=1 width=516)
                                                                     ->  XN Subquery Scan "IN_subquery" (cost=0.00..0.01 rows=1 width=516)
                                                                         ->  XN Unique (cost=0.00..0.00 rows=1 width=516)
                                                                             ->  XN Seq Scan on contacts (cost=0.00..0.00 rows=1 width=516)
                                                             ->  XN Hash (cost=1.88..1.88 rows=1 width=210)
                                                                 ->  XN Seq Scan on bh_email_open_clicks (cost=0.00..1.88 rows=1 width=210)
                                                         ->  XN Hash (cost=0.01..0.01 rows=1 width=210)
                                                             ->  XN Subquery Scan "IN_subquery" (cost=0.00..0.01 rows=1 width=210)
                                                                 ->  XN Unique (cost=0.00..0.00 rows=1 width=28)
                                                                     ->  XN Seq Scan on contacts (cost=0.00..0.00 rows=1 width=28)
                                                     ->  XN Hash (cost=0.45..0.45 rows=1 width=210)
                                                         ->  XN Seq Scan on bh_leads (cost=0.00..0.45 rows=1 width=210)
                                                 ->  XN Hash (cost=0.32..0.32 rows=1 width=402)
                                                     ->  XN Subquery Scan "IN_subquery" (cost=0.30..0.32 rows=1 width=402)
                                                         ->  XN HashAggregate (cost=0.30..0.31 rows=1 width=402)
                                                             ->  XN Seq Scan on campaign_extraction_history (cost=0.00..0.30 rows=1 width=402)
                                             ->  XN Hash (cost=0.35..0.35 rows=1 width=402)
                                                 ->  XN Subquery Scan "IN_subquery" (cost=0.33..0.35 rows=1 width=402)
                                                     ->  XN HashAggregate (cost=0.33..0.34 rows=1 width=402)
                                                         ->  XN Seq Scan on campaign_extraction_history (cost=0.00..0.33 rows=1 width=402)
                                         ->  XN Hash (cost=0.50..0.50 rows=1 width=210)
                                             ->  XN Seq Scan on bh_leads (cost=0.00..0.50 rows=1 width=210)
                                     ->  XN Hash (cost=0.50..0.50 rows=1 width=210)
                                         ->  XN Seq Scan on bh_leads (cost=0.00..0.50 rows=1 width=210)
                                 ->  XN Hash (cost=0.40..0.40 rows=1 width=402)
                                     ->  XN Seq Scan on campaign_extraction_history (cost=0.00..0.40 rows=1 width=402)
                             ->  XN Hash (cost=0.30..0.30 rows=30 width=402)
                                 ->  XN Seq Scan on ce_locked_records_tb (cost=0.00..0.30 rows=30 width=402)
                         ->  XN Hash (cost=0.27..0.27 rows=1 width=210)
                             ->  XN Subquery Scan "IN_subquery" (cost=0.26..0.27 rows=1 width=210)
                                 ->  XN HashAggregate (cost=0.26..0.26 rows=1 width=210)
                                     ->  XN Seq Scan on bh_leads (cost=0.00..0.25 rows=1 width=210)

请建议是否有任何方法可以优化此查询。

sql amazon-redshift
1个回答
0
投票

这感觉就像一次又一次地添加到查询中,具有大量的代码重复和许多不必要的表扫描。

了解我的主要经验是使用MSSQL而不是redshift,但大多数情况下都适用相同的原理。

 (
              lower(xa.primary_function) in (
                select
                  lower(param_val)
                from
                  ce_campaign_spec_tb
                where
                  job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                  and param = 'primary_function'
                  and relation_id = 4
              )
              and lower(xa.role) in (
                select
                  lower(param_val)
                from
                  ce_campaign_spec_tb
                where
                  job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                  and param = 'role'
                  and relation_id = 4
              )
              and lower(xa.title) in (
                select
                  lower(title)
                from
                  contacts con
                  inner join ce_campaign_spec_tb camp on lower(con.title) ilike '%' || trim(
                    both ' '
                    from
                      camp.param_val
                  ) || '%'
                where
                  job_id = '1d8db587-f5ab-41f4-9c2b-c4e21e0c7481'
                  and param = 'title'
                  and relation_id = 4
              )
            )

在不知道这是做什么的情况下,您似乎重复了此代码块5次,唯一的更改是related_id。您从ID 4开始,然后是2,然后是1,然后是3然后是5,但是其他的ID似乎没有变化。可能会有细微的差别,但是现在您开始分别扫描表5次,而不是一次使用单个谓词进行一次扫描。根据表的大小,这可能是您正在扫描的大量数据

另外几行:

and xa.contact_id not in (
            select
              contact_id
            from
              bh_leads
            where
              (CURRENT_DATE - creation_date :: date) <= 60
              and UPPER(LOB) = 'ABC'
              and agency_id = '1002'
          )
          and xa.contact_id not in (
            select
              contact_id
            from
              bh_leads
            where
              (CURRENT_DATE - creation_date :: date) <= 60
              and UPPER(LOB) = 'ABC'
              and sponsor_id = '8306'
          )

再次有2个表扫描几乎相同的数据,只是区别在于检查Sponsor_id是否有值,而另一个则是agency_id。这可以在单个语句中完成,而不是2

进一步向下:

and email_id not in (
            select
              distinct email_id
            from
              contacts
            where
              is_email_suppressed = 1
          )

您之前引用了联系人(xa),并将其作为谓词放在where子句中:

and xa.is_email_suppressed = 0

我无法确定所讨论表的确切架构,但是它们似乎在做相同的事情。

也,来自Redshift文档:https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html

似乎您可以在单个会话期间创建临时表。大多数子查询都可以准备,因此您可以加入结果集。例如,如果您首先准备一个带有有效结果的campaign_extraction_history表的临时结果集,则可以用单个左联接替换以下谓词:

              AND contact_id NOT IN (
            select
              contact_id
            from
              campaign_extraction_history
            where
              sf_oms_campaign_id = 'ABC-013505-2-2020'
              and sf_campaign_id = 'ABC-013505'
              and (CURRENT_DATE - creation_date :: date) < 1
              and channel = 'BOTH'
              and (
                UPPER(STATUS) = 'EXTRACTED'
                OR UPPER(STATUS) = 'LAUNCHED'
                OR UPPER(STATUS) = 'CONFIRMED'
              )
          )
          AND contact_id NOT IN (
            select
              contact_id
            from
              campaign_extraction_history
            where
              creation_date :: date = CURRENT_DATE
              and channel = 'BOTH'
              and (
                UPPER(STATUS) = 'EXTRACTED'
                OR UPPER(STATUS) = 'LAUNCHED'
                OR UPPER(STATUS) = 'CONFIRMED'
              )
            group by
              contact_id
            having
              count(*) > 10
          )
          AND contact_id NOT IN (
            select
              contact_id
            from
              campaign_extraction_history
            where
              sf_campaign_id = 'ABC-013505'
              and channel = 'BOTH'
              and (
                UPPER(STATUS) = 'EXTRACTED'
                OR UPPER(STATUS) = 'LAUNCHED'
                OR UPPER(STATUS) = 'CONFIRMED'
              )
            group by
              contact_id
            having
              count(*) >= 3
          )

可能有更多的地方可以组合查询并仅一次从表中获取数据。例如,您排除了许多email_id值,但是在不同的语句和子查询中的各个位置。它们很可能在单个语句中完成。

也许提高性能的最好方法是问问自己查询要做什么并排除它,然后只重写整个查询。这可能是相当多的工作,但从长远来看最终可能会更快。

© www.soinside.com 2019 - 2024. All rights reserved.