对于我正在从事的一个项目,我被要求解决这个任务:
给定表格
活动 event_id int (自动增量) --10B 不同值 event_ts 日期时间 -- 10B event_type int (1 = 展示次数, 2 = 点击次数, 3 = 购买次数...) --20 种 Product_id int --100K client_id int --10M client_type int --10
Q:找出在购买前看过产品印象的客户数量
我想出了两个解决方案:
1)
With cteProdsClients As (
select e1.product_id ,e1.client_id
from events as e1
where event_type = ‘3’ and
EXISTS (SELECT e2.product_id ,e2.client_id
FROM events as e2
WHERE event_type = ‘1’ and e1.product_id = e2.product_id
AND e1.client_id = e2.client_id
AND e1.event_ts <e2.event_ts )
count(distinct product_id)
From cteProdsClients
)
SELECT
count(distinct client_id)
FROM
cteProdsClients ;
2)
With cteProdsClients As (
select e1.product_id ,e1.client_id
from events as e1 left join
(
SELECT e2.product_id ,e2.client_id
FROM events as e2
WHERE event_type = ‘1’ AND e1.event_ts <e2.event_ts
)
ON e1.product_id = e2.product_id AND e1.client_id = e2.client_id
WHERE event_type = ‘3’
)
SELECT
count(distinct client_id)
FROM
cteProdsClients ;
我不仅需要创建一个可以获得输出的查询,而且还需要以最有效、最优化的方式执行此操作。这2个哪个更好?如果您修复错误(如果存在)并提出更好的解决方案,我将不胜感激 谢谢
你似乎把事情过于复杂化了;我相信这会给你你想要的。性能将取决于适当的索引等。
SELECT COUNT(DISTINCT E1.CLIENT_ID)
FROM EVENTS E1
INNER JOIN EVENTS E2 ON
E1.CLIENT_ID = E2.CLIENT_ID AND
E1.PRODUCT_ID = E2.PRODUCT_ID AND
E1.EVENT_TS < E2.EVENT_TS AND
E2.EVENT_TYPE = '3'
WHERE E1.EVENT_TYPE = '1'