如何将以下基于oracle的sql查询写入等效的pyspark sql,因为由于在spark.sql(* query)下嵌套,因此不支持此操作还有什么办法也可以使用pyspark dataframe来编写它吗?
SELECT TABLE1.COL1
FROM TABLE1
WHERE COL2 = (
SELECT MAX(COL2)
FROM TABLE1
WHERE TABLE1.COL3 = TABLE2.COL3 OR TABLE1.COL4 = TABLE2.COL4
)
TABLE1的列为COL1, COL2, COL3, COL4
TABLE2具有列COL3, COL4
sql_request = "(select TABLE1.COL1 FROM TABLE1 WHERE COL2 = (SELECT MAX(COL2) FROM TABLE1 WHERE TABLE1.COL3 = TABLE2.COL3 OR TABLE1.COL4 = TABLE2.COL4))"
df_request = spark.read.jdbc(url=url,table=sql_request,properties=db_properties)