在单个Glue Job中的多个表上在源端运行SQL脚本，并且其表命名约定与S3相对应

Question

sql_list = ['(select * from table1 where rownum <= 100) alias1','(select * from table2 where rownum <= 100) alias2']

for sql_statement in sql_list: df = spark.read.format("jdbc").option("driver", jdbc_driver_name).option("url", db_url).option("dbtable", sql_statement).option("user", db_username).option("password", db_password).option("fetchSize", 100000).load()

df.write.format("parquet").mode("overwrite").save("s3://s3-location/" + sql_statement)

源是Oracle数据库

我能够运行查询数组并将其存储在S3中的拼花地板中，但是使用的命名与sql_list上列出的命名相同，我想将数据分别存储为S3，并分别命名为alias1和alias2。 >

sql_list = ['（select from from table1 where rownum <= 100）alias1'，'（（select * from table2 from where rownum <= 100）alias2'] for sql_statement in sql_list：df = spark.read.format（“ jdbc “）.option（” driver“ ...

Answer 1

考虑使用字典而不是列表，因为它既整洁又灵活。

在单个Glue Job中的多个表上在源端运行SQL脚本，并且其表命名约定与S3相对应

问题描述投票：0回答：1

1个回答

最新问题

在单个Glue Job中的多个表上在源端运行SQL脚本，并且其表命名约定与S3相对应

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1