我有一个 dbt-core 项目,其中包含 2 个表(1 个增量表,1 个聚合增量表)。
由于 dbt 仅记录直接 *.sql 文件,而不记录使用 MERGE 语句的增量宏魔法,因此在运行项目时,
bytes_billed
中记录的 /target/run_results.json
会丢失大部分成本。
为了解决这个问题,我需要一种方法来获取运行 dbt 项目时创建的所有作业。
dbt_project.yml:
models:
dbt_tracking:
near_realtime:
+tags: near_realtime
+labels:
source: dbt_near_realtime
+materialized: table
models/near_realtime/stg_near_realtime_events.sql
文件的配置:
{{
config(
labels={"task_id": "near_realtime"},
materialized="incremental",
incremental_strategy="merge",
on_schema_change="append_new_columns",
partition_by={
"field": "derived_tstamp",
"data_type": "timestamp",
"granularity": "hour",
},
partition_expiration_days=1,
cluster_by=["event_name"],
)
}}
models/near_realtime/int_near_realtime_aggs.sql
文件的配置:
{{
config(
labels = {"task_id": "near_realtime"},
)
}}
执行命令:
dbt run --select tag:near_realtime
但是当我在中搜索我的标签时
`my_project_id.region-eu.INFORMATION_SCHEMA.JOBS_BY_USER`
我通过以下方式获得的唯一 label.key:
select distinct
labels[SAFE_OFFSET(0)].key
from
`my_project_id.region-eu.INFORMATION_SCHEMA.JOBS_BY_USER`
where
timestamp_trunc(creation_time, day) = timestamp(current_date())
是这个:
dbt_invocation_id
如何查询我的查询的作业,以便可以查看我的项目在为成本计算创建的 dbt 中运行的total_bytes_billed?
query-comment:
job-label: True
comment: "{{ var('query_label', 'dbt_run') }}"
然后您可以通过以下方式设置查询标签:
dbt run --select tag:my_tag --vars '{"query_label": "here_is_my_label"}'
然后,您可以通过以下方式分析 BQ 中每个作业查询标签的total_bytes_billed:
with
dbt_cloud_logs as (
select
datetime(creation_time, "Europe/Berlin") as creation_time
, (select unnested_labels.value from unnest(labels) as unnested_labels where unnested_labels.key = "dbt_invocation_id") as dbt_invocation_id
, (select unnested_labels.value from unnest(labels) as unnested_labels where unnested_labels.key = "query_comment") as dbt_query_label
, total_bytes_billed
from
`region-eu.INFORMATION_SCHEMA.JOBS`
where
1=1
and user_email = "your_service_account_or_email_adress"
and timestamp_trunc(creation_time, day, "Europe/Berlin") >= timestamp("2024-05-23", "Europe/Berlin")
)
select * from dbt_cloud_logs order by dbt_query_label, dbt_invocation_id, creation_time desc