databricks delta - 动态替换where条件

问题描述 投票:0回答:1

我正在尝试使用azure databricks 在增量表上使用replacewhere 子句。这是重现问题的设置:

CREATE TABLE mymaintable (dt DATE, name STRING, YN string) USING delta;

INSERT INTO mymaintable VALUES ('2024-03-01', 'N1', 'Y');
INSERT INTO mymaintable VALUES ('2024-03-01', 'N2', 'N');
INSERT INTO mymaintable VALUES ('2024-03-01', 'N3', 'Y');

INSERT INTO mymaintable VALUES ('2024-03-02', 'N1', 'N');
INSERT INTO mymaintable VALUES ('2024-03-02', 'N2', 'N');
INSERT INTO mymaintable VALUES ('2024-03-02', 'N3', 'N');

INSERT INTO mymaintable VALUES ('2024-03-03', 'N1', 'Y');
INSERT INTO mymaintable VALUES ('2024-03-03', 'N2', 'Y');
INSERT INTO mymaintable VALUES ('2024-03-03', 'N3', 'Y');

CREATE TABLE myincrementaltable (dt DATE, name STRING, YN string) USING delta;

INSERT INTO myincrementaltable VALUES ('2024-03-03', 'N1', 'X');
INSERT INTO myincrementaltable VALUES ('2024-03-03', 'N2', 'Z');
INSERT INTO myincrementaltable VALUES ('2024-03-04', 'Q1', 'X');
INSERT INTO myincrementaltable VALUES ('2024-03-04', 'Q2', 'Z');

这就是设置。现在我想把增量表替换到主表中。

这有效:

INSERT INTO mymaintable
REPLACE WHERE dt >= "2024-03-03"
TABLE myincrementaltable

但这不是:

INSERT INTO mymaintable
REPLACE WHERE dt >= (SELECT MAX(dt) from mymaintable)
TABLE myincrementaltable

它失败并出现错误:

AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view `mymaintable` cannot be found. Verify the spelling and correctness of the schema and catalog.

有办法做到这一点吗?

  • 需要调整任何 Spark 配置吗?
  • 或者使用sql语句动态传递最大日期的方法?

谢谢!

databricks azure-databricks delta
1个回答
0
投票

我不确定直接 SQL 语句更新,但我会在单独的 python 单元中尝试以下操作

Var = spark.sql("SELECT MAX(dt) from sandbox.mymaintable").collect()[0][0]

SQLString = f"""
INSERT INTO sandbox.mymaintable
REPLACE WHERE dt >= '{Var}'
TABLE sandbox.myincrementaltable"""

spark.sql(SQLString)

尝试一下。希望有帮助

© www.soinside.com 2019 - 2024. All rights reserved.