我正在尝试编写sql查询以在pyspark中使用,以从表中清除信息。我要修改的表如下所示:
hashed_customer firstname lastname email order_id status timestamp
eater 1_uuid 1_firstname 1_lastname 1_email 12345 OPTED_IN 2020-05-14 20:45:15
eater 2_uuid 2_firstname 2_lastname 2_email 23456 OPTED_IN 2020-05-14 20:29:22
eater 3_uuid 3_firstname 3_lastname 3_email 34567 OPTED_IN 2020-05-14 19:31:55
eater 4_uuid 4_firstname 4_lastname 4_email 45678 OPTED_IN 2020-05-14 17:49:27
我有另一个与客户的表,我需要从CUSTOMER ORDERS表中删除,看起来像这样:
hashed_customer eaterstatus
eater 1_uuid OPTED_OUT
eater 3_uuid OPTED_OUT
我正在尝试编写一个SQL查询,如果客户在第二个表中,则该查询将从第一个表中删除名字,姓氏和电子邮件。有点像:
DELETE firstname, lastname, email FROM Customer_TB
WHERE hashed_customer IN
(SELECT hashed_customer FROM Customer_Remove_TB)
这样最终结果将看起来像:
hashed_customer firstname lastname email order_id status timestamp
eater 1_uuid NaN NaN NaN 12345 OPTED_IN 2020-05-14 20:45:15
eater 2_uuid 2_firstname 2_lastname 2_email 23456 OPTED_IN 2020-05-14 20:29:22
eater 3_uuid NaN NaN NaN 34567 OPTED_IN 2020-05-14 19:31:55
eater 4_uuid 4_firstname 4_lastname 4_email 45678 OPTED_IN 2020-05-14 17:49:27
我认为,您可以将列更新为null或字符串空“”而不是delete。