Hive - 如果在另一个表中找不到它们,则更新具有今天日期的表中的记录?

问题描述 投票:1回答:1

我目前有一个主结果表(test1),它存储我的所有问题记录和第二个表(test2),每周运行一次,我试图找到那些在每周更新中不存在的记录,然后更新主结果表中的日期,就像在要更正的系统中更新日期一样。

我试图将test2表中的记录添加到test1表中,如果它们尚未在表中。

这有效:

insert into table test1 (id, name, code)
select * from test2 t2 where t2.id not in (select id from test1);

我正在尝试更新表test1 'Corrected_date'列,以显示在test1中找到但在test2中找不到的所有记录的current_date

示例数据如下:

表格1

ID    NAME    CODE    CORRECTED_DATE
1     TEST    3    
29    TEST2   90 

表2

ID    NAME    CODE  
12    TEST5   20
1     TEST    3

表1的预期最终结果

ID    NAME    CODE    CORRECTED_DATE
1     TEST    3       
29    TEST2   90       3/13/2019
12    TEST5   20
hive jupyter-notebook hiveql
1个回答
0
投票

使用FULL JOIN覆盖表。 FULL JOIN返回已连接的记录+未从左表加入+未从右表加入。您可以使用case语句来实现这样的逻辑:

insert OVERWRITE table test1

select 
      --select t1 if both or t1 only exist, t2 if only t2 exists
      case when t1.ID is null then t2.ID   else t1.ID   end as ID,
      case when t1.ID is null then t2.NAME else t1.NAME end as NAME,
      case when t1.ID is null then t2.CODE else t1.CODE end as CODE,

      --if found in t1 but not in t2 then current_date else leave as is
      case when (t1.ID is not null) and (t2.ID is null) then current_date else t1.CORRECTED_DATE end as CORRECTED_DATE 
  from test1 t1 
       FULL OUTER JOIN test2 t2 on t1.ID=t2.ID;

另请参阅有关增量更新的类似问题,您的逻辑不同但方法是相同的:https://stackoverflow.com/a/37744071/2700344

使用您的数据进行测试

with test1 as (
select stack (2,
1, 'TEST',    3,null,    
29,'TEST2',   90 , null
             ) as (ID,NAME,CODE,CORRECTED_DATE)
),

     test2 as (
select stack (2,
              12,'TEST5',20,
              1,'TEST',3
             ) as (ID, NAME, CODE)
)

select 
      --select t1 if both or t1 only exist, t2 if only t2 exists
      case when t1.ID is null then t2.ID   else t1.ID   end as ID,
      case when t1.ID is null then t2.NAME else t1.NAME end as NAME,
      case when t1.ID is null then t2.CODE else t1.CODE end as CODE,

      --if found in test1 but not in test2 then current_date else leave as is
      case when (t1.ID is not null) and (t2.ID is null) then current_date else t1.CORRECTED_DATE end as CORRECTED_DATE 
  from test1 t1 
       FULL OUTER JOIN test2 t2 on t1.ID=t2.ID;

结果:

OK
id      name    code    corrected_date
1       TEST    3       NULL
12      TEST5   20      NULL
29      TEST2   90      2019-03-14
Time taken: 41.727 seconds, Fetched: 3 row(s)

结果如预期。

© www.soinside.com 2019 - 2024. All rights reserved.