我必须为以下需求构建一个 Hive SQL 查询
我有一张顾客桌。我需要将表总记录除以 6(即假设表每月包含 600 条记录,最多 6 个月包含 100 条记录),每个月都有一个括号来定位客户。假设括号限制为 4,那么我需要从 5 个唯一帐户中选择 5 个唯一电子邮件 ID。如果是 10,则来自 10 个唯一帐户的 10 个唯一电子邮件 ID
注意:我使用mod操作来分发6个月的记录。
账户 | 电子邮件 | 模组,6 |
---|---|---|
ACC 1 | 电子邮件@acc1 | 1 |
acc2 | 电子邮件1@acc2 | 1 |
acc2 | 电子邮件2@acc2 | 2 |
acc2 | 电子邮件3@acc2 | 3 |
acc2 | 电子邮件4@acc2 | 4 |
acc2 | 电子邮件5@acc2 | 5 |
acc2 | 电子邮件6@acc2 | 6 |
acc2 | 电子邮件7@acc2 | 1 |
acc3 | 电子邮件1@acc3 | 1 |
acc3 | 电子邮件2@acc3 | 2 |
acc3 | 电子邮件3@acc3 | 3 |
acc4 | 电子邮件@acc4 | 1 |
acc5 | 电子邮件1@acc5 | 1 |
acc5 | 电子邮件2@acc5 | 2 |
预期输出 - 括号为 4(不需要以下输出 acc5,因为记录计数已达到括号范围 - 4)
账户 | 电子邮件 | mod,6 |
---|---|---|
ACC 1 | 电子邮件@acc1 | 1 |
acc2 | 电子邮件1@acc2 | 1 |
acc3 | 电子邮件1@acc3 | 1 |
acc4 | 电子邮件@acc4 | 1 |
如果括号是8(我必须先选择所有唯一帐户,然后选择其他顺序才能达到括号范围)
预期产量
账户 | 电子邮件 | mod,6 |
---|---|---|
ACC 1 | 电子邮件@acc1 | 1 |
acc2 | 电子邮件1@acc2 | 1 |
acc3 | 电子邮件1@acc3 | 1 |
acc4 | 电子邮件@acc4 | 1 |
acc5 | 电子邮件1@acc5 | 1 |
acc2 | 电子邮件7@acc2 | 1 |
acc2 | 电子邮件2@acc2 | 2 |
acc3 | 电子邮件2@acc3 | 2 |
如果括号是 10
账户 | 电子邮件 | mod,6 |
---|---|---|
ACC 1 | 电子邮件@acc1 | 1 |
acc2 | 电子邮件1@acc2 | 1 |
acc3 | 电子邮件1@acc3 | 1 |
acc4 | 电子邮件@acc4 | 1 |
acc5 | 电子邮件1@acc5 | 1 |
acc2 | 电子邮件7@acc2 | 1 |
acc2 | 电子邮件2@acc2 | 2 |
acc3 | 电子邮件2@acc3 | 2 |
acc5 | 电子邮件2@acc5 | 2 |
acc2 | 电子邮件3@acc2 | 3 |
我尝试了以下查询。但它首先获取所有 1 条记录。我不知道如何首先使用 mod_seq_value 1 获取唯一帐户记录,然后从 mod seq -1 开始剩余记录。
select * from (
select *, Row_number() over(order by mod_num_seq,acc_count) as rnk
select account,email,
count(*) over(partition by account) as acc_count
,case
when mod(row_number() over(partition by account),6)=0 then 6
else mod(row_number() over(partition by account),6)=0
end as mod_num_seq
from
customer
)a
)b where rnk<={:bracket}
不确定为什么输出中的第 6 行和第 7 行是
email7@acc2, email2@acc2
而不是 email7@acc2, email2@acc3
。
with customer (account, email, x) as
(
select 'acc1','email@acc1', 1 from dual
union all select 'acc2','email1@acc2', 1 from dual
union all select 'acc2','email2@acc2', 2 from dual
union all select 'acc2','email3@acc2', 3 from dual
union all select 'acc2','email4@acc2', 4 from dual
union all select 'acc2','email5@acc2', 5 from dual
union all select 'acc2','email6@acc2', 6 from dual
union all select 'acc2','email7@acc2', 1 from dual
union all select 'acc3','email1@acc3', 1 from dual
union all select 'acc3','email2@acc3', 2 from dual
union all select 'acc3','email3@acc3', 3 from dual
union all select 'acc4','email@acc4', 1 from dual
union all select 'acc5','email1@acc5', 1 from dual
union all select 'acc5','email2@acc5', 2 from dual
)
, t as
(
select c.*, nvl(nullif(mod(row_number() over (partition by account order by email),6),0),6) rn
from customer c
)
select t.*, row_number() over (partition by account order by rn, email) pick_up_order
from t
order by pick_up_order, account;
结果:
ACCOUNT EMAIL X RN PICK_UP_ORDER
------- ----------- ---------- ---------- -------------
acc1 email@acc1 1 1 1
acc2 email1@acc2 1 1 1
acc3 email1@acc3 1 1 1
acc4 email@acc4 1 1 1
acc5 email1@acc5 1 1 1
acc2 email7@acc2 1 1 2
acc3 email2@acc3 2 2 2
acc5 email2@acc5 2 2 2
acc2 email2@acc2 2 2 3
acc3 email3@acc3 3 3 3
acc2 email3@acc2 3 3 4
acc2 email4@acc2 4 4 5
acc2 email5@acc2 5 5 6
acc2 email6@acc2 6 6 7
14 rows selected.