表1
客户 | 挑选多少个 |
---|---|
a | 2 |
b | 1 |
c | 3 |
表2
客户 | 数据 |
---|---|
a | a123 |
a | a234 |
a | a345 |
a | a456 |
b | b123 |
b | b234 |
b | b345 |
b | b456 |
c | c123 |
c | c234 |
c | c345 |
c | c456 |
我想从表 2 中选择随机
X
(或者如果更简单,则选择顶部 X
)行,而 X 是第一个表中的 how_many_to_pick
。
在上面的例子中,输出应该是
客户 | 数据 |
---|---|
a | a123 |
a | a234 |
b | b123 |
c | c123 |
c | c234 |
c | c345 |
如果所有客户端的 X 都是相同的数字,我将使用
row_number() over partition
创建增量数字作为索引,并选择索引所在的行 <= X, but I don't know how to do it if X is different for every client?
我在 BigQuery 中执行此操作。
这是根据您的描述的方法之一:(这并不是真正随机的,但行数存在于某处。)
with table1 as (
select 'a' as client, 2 as how_many_to_pick
union all
select 'b' as client, 1 as how_many_to_pick
),
table2 as (
select 'a' as client, 'a123' as data
union all
select 'a' as client, '234' as data
union all
select 'a' as client, 'a345' as data
union all
select 'a' as client, 'a456' as data
union all
select 'b' as client, 'b123' as data
union all
select 'b' as client, 'b234' as data
union all
select 'b' as client, 'b345' as data
)
select t2.client, t2.data
from (select *, row_number() over(partition by client order by data) as rnk
from table2
) as t2
join table1 as t1
on t1.client = t2.client
where t2.rnk <= t1.how_many_to_pick
这里您可以使用如上所示的位置,或者您也可以尝试:
on t1.client = t2.client and t2.rnk <= t1.how_many_to_pick
无论哪种情况,它都会为您提供所需的内容: