Redshift - 获取每个人的第一个工作日期。我想要一张包含该信息以及他们的名字/姓氏、电子邮件地址、开始日期的表格

Question

applicants_cte as 
    (select person_key
    from applicants.people
    where date is between '01-01-2022' and '01-01-2023'
    group by person_key
    )
    ,

list_of_applicants as 
        ( select *
    from (select rank() over (partition by ap.person_id order by 
        ap.working_date asc) list,
        ap.person_id
        date(applicants.start_date) as start_date,
        a.first_name,
        a.last_name,
        b.email_address
    from applicants.people ap
    join information a on a.person_key = ap_id_key
    join contact_information b on b.person_info_id = ap.id_key
    where date(applicants.start_date) IS NOT NULL
          and ap.person_id in (select applicant_id 
        from other_cte)) c
where c.list = 1
    )

我正在尝试通过以下方式优化 list_of_applicants CTE：

删除子查询
删除rank()并将其转换为where/group by
删除 order by() 并将其转换为 min(ap.working_date)

遇到的问题：

运行 select count(*) from list_of_applicants 时 --- 我的返回值偏离了 200,000
不太确定在删除子查询时如何重新构造 CTE

当我尝试重构它时，它给出了 200,000 的差异，这就是代码的样子：

    applicants_cte as 
    (select person_key
    from applicants.people
    where date is between '01-01-2022' and '01-01-2023'
    group by person_key
    )
    ,

list_of_applicants as 
        --Removed subquery and rank()
    /*( select *
    from (select rank() over (partition by ap.person_id order by 
        ap.working_date asc) list,
    */
      (select min(ap.working_date),
        ap.person_id
        date(applicants.start_date) as start_date,
        a.first_name,
        a.last_name,
        b.email_address
    from applicants.people ap
    join information a on a.person_key = ap_id_key
    join contact_information b on b.person_info_id = ap.id_key
    -- replaced where statement with join 
    join applicants_cte apps on apps.person_key = apps.person_key
     where date(applicants.start_date) IS NOT NULL
    /* where date(applicants.start_date) IS NOT NULL
          and ap.person_id in (select applicant_id 
        from applicants_cte)) c
    where c.list = 1
    */
    group by ap.person_id,
        start_date,
        a.first_name,
        a.last_name,
        b.email_address
    )

Answer 1

你想做的事情不会成功。 RANK() 窗口函数 - 以及具有 where Rank 值 = 1 的子查询，以及窗口函数的 order by 部分 - 提供了 min() 和 group by 无法获得的特定功能。我建议您阅读窗口函数以及它们如何与 group by 不同地处理数据。

您没有说明为什么要删除查询的这些方面 - 意图。可能还有另一种方法来解决这个问题。

还请在论坛上提出具体问题。这读起来就像“为我解决我的工作”，对于那些拿钱来做这件事的人来说，这不会引起太多关注。

Redshift - 获取每个人的第一个工作日期。我想要一张包含该信息以及他们的名字/姓氏、电子邮件地址、开始日期的表格

问题描述投票：0回答：1

1个回答

最新问题

Redshift - 获取每个人的第一个工作日期。我想要一张包含该信息以及他们的名字/姓氏、电子邮件地址、开始日期的表格

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1