我有一个 dbt 数据管道,它创建了许多带有大量文件的 athena 表,我注意到所有运行都需要很长时间才能运行简单的查询......所以在 dbt.log 中我发现了这个查询:
WITH views AS (
select
table_catalog as database,
table_name as name,
table_schema as schema
from "awsdatacatalog".INFORMATION_SCHEMA.views
where table_schema = LOWER('graphs_db')
), tables AS (
select
table_catalog as database,
table_name as name,
table_schema as schema
from "awsdatacatalog".INFORMATION_SCHEMA.tables
where table_schema = LOWER('graphs_db')
-- Views appear in both `tables` and `views`, so excluding them from tables
EXCEPT
select * from views
)
select views.*, 'view' AS table_type FROM views
UNION ALL
select tables.*, 'table' AS table_type FROM tables
它可能在检查它可以使用哪些表之前运行。 无论如何,此查询需要 5 分钟才能运行。我的管道中有几个 dbt 步骤,因此这大大减慢了它的速度。正常吗?有什么办法可以优化吗?
你可以看看这个Tips to improve the Athena Query Performance
特别点没有4.