Redshift - 获取每个人的第一个工作日期。我想要一张包含该信息以及他们的名字/姓氏、电子邮件地址、开始日期的表格

问题描述 投票:0回答:1
applicants_cte as 
    (select person_key
    from applicants.people
    where date is between '01-01-2022' and '01-01-2023'
    group by person_key
    )
    ,

list_of_applicants as 
        ( select *
    from (select rank() over (partition by ap.person_id order by 
        ap.working_date asc) list,
        ap.person_id
        date(applicants.start_date) as start_date,
        a.first_name,
        a.last_name,
        b.email_address
    from applicants.people ap
    join information a on a.person_key = ap_id_key
    join contact_information b on b.person_info_id = ap.id_key
    where date(applicants.start_date) IS NOT NULL
          and ap.person_id in (select applicant_id 
        from other_cte)) c
where c.list = 1
    ) 

我正在尝试通过以下方式优化 list_of_applicants CTE:

  • 删除子查询
  • 删除rank()并将其转换为where/group by
  • 删除 order by() 并将其转换为 min(ap.working_date)

遇到的问题:

  • 运行 select count(*) from list_of_applicants 时 --- 我的返回值偏离了 200,000
  • 不太确定在删除子查询时如何重新构造 CTE

当我尝试重构它时,它给出了 200,000 的差异,这就是代码的样子:

    applicants_cte as 
    (select person_key
    from applicants.people
    where date is between '01-01-2022' and '01-01-2023'
    group by person_key
    )
    ,

list_of_applicants as 
        --Removed subquery and rank()
    /*( select *
    from (select rank() over (partition by ap.person_id order by 
        ap.working_date asc) list,
    */
      (select min(ap.working_date),
        ap.person_id
        date(applicants.start_date) as start_date,
        a.first_name,
        a.last_name,
        b.email_address
    from applicants.people ap
    join information a on a.person_key = ap_id_key
    join contact_information b on b.person_info_id = ap.id_key
    -- replaced where statement with join 
    join applicants_cte apps on apps.person_key = apps.person_key
     where date(applicants.start_date) IS NOT NULL
    /* where date(applicants.start_date) IS NOT NULL
          and ap.person_id in (select applicant_id 
        from applicants_cte)) c
    where c.list = 1
    */
    group by ap.person_id,
        start_date,
        a.first_name,
        a.last_name,
        b.email_address
    ) 
sql subquery amazon-redshift query-optimization common-table-expression
1个回答
0
投票

你想做的事情不会成功。 RANK() 窗口函数 - 以及具有 where Rank 值 = 1 的子查询,以及窗口函数的 order by 部分 - 提供了 min() 和 group by 无法获得的特定功能。我建议您阅读窗口函数以及它们如何与 group by 不同地处理数据。

您没有说明为什么要删除查询的这些方面 - 意图。可能还有另一种方法来解决这个问题。

还请在论坛上提出具体问题。这读起来就像“为我解决我的工作”,对于那些拿钱来做这件事的人来说,这不会引起太多关注。

© www.soinside.com 2019 - 2024. All rights reserved.