雪花 - 使用timediff更新相关子查询

问题描述 投票:0回答:1

我在Snowflake数据库上运行此查询:

UPDATE "click" c
SET "Registration_score" =
(SELECT COUNT(*) FROM "trackingpoint" t
WHERE 1=1
AND c."CookieID" = t."CookieID"
AND t."page" ilike '%Registration complete'
AND TIMEDIFF(minute,c."Timestamp",t."Timestamp") < 4320
AND TIMEDIFF(second,c."Timestamp",t."Timestamp") > 0);

数据库返回Unsupported subquery type cannot be evaluated。但是,如果我在没有最后两个条件(使用TIMEDIFF)的情况下运行它,它可以正常工作。我确认这些查询的实际TIMEDIFF语句没问题:

select count(*) from "trackingpoint"
where TIMEDIFF(minute, '2018-01-01', "Timestamp") > 604233;
select count(*) from "click"
where TIMEDIFF(minute, '2018-01-01', "Timestamp") > 604233;

而这些工作没有问题。我没有看到为什么TIMEDIFF条件会阻止数据库返回结果的原因。知道我应该改变什么来使它工作?

sql sql-update correlated-subquery snowflake-datawarehouse snowflake
1个回答
1
投票

所以使用以下设置

create table click (id number, 
   timestamp timestamp_ntz,
   cookieid number,
   Registration_score number);
create table trackingpoint(id number, 
   timestamp timestamp_ntz, 
   cookieid number, 
   page text );


insert into click values (1,'2018-03-20', 101, 0),
    (2,'2019-03-20', 102, 0);
insert into trackingpoint values (1,'2018-03-20 00:00:10', 101, 'user reg comp'),
    (2,'2018-03-20 00:00:11', 102, 'user reg comp'),
    (3,'2018-03-20 00:00:13', 102, 'pet reg comp'),
    (4,'2018-03-20 00:00:15', 102, 'happy dance');

你可以看到我们得到了我们期望的行

select c.*, t.*
from click c
join trackingpoint t 
    on c.cookieid = t.cookieid ;

现在有两种方法可以计算,第一种就是你拥有它,如果你只计算一件事,那就好了,因为所有规则都是连接过滤:

select c.id,
  count(1) as new_score
from click c
join trackingpoint t 
    on c.cookieid = t.cookieid
    and t.page ilike '%reg comp'
    and TIMEDIFF(minute, c.timestamp, t.timestamp) < 4320
group by 1;

或者你可以(在雪花语法中)将计数移动到聚合/选择方面,从而得到多个答案,如果这是你需要的(这是我发现自己更多的地方,因此我提出它的原因):

select c.id,
    sum(iff(t.page ilike '%reg comp' AND TIMEDIFF(minute, c.timestamp, t.timestamp) < 4320, 1, 0)) as new_score
from click c
join trackingpoint t 
    on c.cookieid = t.cookieid
group by 1;

因此将其插入UPDATE模式(参见文档中的最后一个示例)https://docs.snowflake.net/manuals/sql-reference/sql/update.html

您可以移动到单个子选择而不是雪花不支持的相关子查询,这是您获得的错误消息。

UPDATE click c
SET Registration_score = s.new_score
from (
    select ic.id,
        count(*) as new_score
    from click ic
    join trackingpoint it 
        on ic.cookieid = it.cookieid
        and it.page ilike '%reg comp'
        and TIMEDIFF(minute, ic.timestamp, it.timestamp) < 4320
    group by 1) as s
WHERE c.id = s.id; 

添加TIMEDIFF的原因是将查询转换为相关的子查询,是UPDATE的每一行,现在与子查询结果相关,即相关性。解决方法是制作“大而简单”的子查询并加入其中。

© www.soinside.com 2019 - 2024. All rights reserved.