Pyspark SQL用NULL替换元素

问题描述 投票:0回答:1

我正在尝试编写sql查询以在pyspark中使用,以从表中清除信息。我要修改的表如下所示:

  hashed_customer     firstname    lastname    email   order_id    status          timestamp
      eater 1_uuid  1_firstname  1_lastname  1_email    12345    OPTED_IN     2020-05-14 20:45:15
      eater 2_uuid  2_firstname  2_lastname  2_email    23456    OPTED_IN     2020-05-14 20:29:22
      eater 3_uuid  3_firstname  3_lastname  3_email    34567    OPTED_IN     2020-05-14 19:31:55
      eater 4_uuid  4_firstname  4_lastname  4_email    45678    OPTED_IN     2020-05-14 17:49:27

我有另一个与客户的表,我需要从CUSTOMER ORDERS表中删除,看起来像这样:

hashed_customer    eaterstatus
   eater 1_uuid      OPTED_OUT
   eater 3_uuid      OPTED_OUT

我正在尝试编写一个SQL查询,如果客户在第二个表中,则该查询将从第一个表中删除名字,姓氏和电子邮件。有点像:

DELETE firstname, lastname, email FROM Customer_TB 
    WHERE hashed_customer IN
        (SELECT hashed_customer FROM Customer_Remove_TB)

这样最终结果将看起来像:

hashed_customer     firstname    lastname    email   order_id    status          timestamp
   eater 1_uuid           NaN         NaN      NaN    12345    OPTED_IN     2020-05-14 20:45:15
   eater 2_uuid   2_firstname  2_lastname  2_email    23456    OPTED_IN     2020-05-14 20:29:22
   eater 3_uuid           NaN         NaN      NaN    34567    OPTED_IN     2020-05-14 19:31:55
   eater 4_uuid   4_firstname  4_lastname  4_email    45678    OPTED_IN     2020-05-14 17:49:27
sql database replace pyspark sql-delete
1个回答
0
投票

我认为,您可以将列更新为null或字符串空“”而不是delete。

© www.soinside.com 2019 - 2024. All rights reserved.