当使用 --batch 选项从 bq 命令行工具执行查询作业时,如果它是单个语句,它将获得 BATCH 优先级。但如果是一组语句,则为父 SCRIPT 作业分配 BATCH,但为各个语句分配 INTERACTIVE 优先级。对存储过程的调用也是如此。 优先级是从 information_schema.jobs 视图中观察到的。 Python API 也会发生相同的行为。
当父脚本作业以 BATCH 优先级运行时,子作业是否也应该获得 BATCH 优先级?我在文档中没有找到任何解释这一点的内容。也许这是有原因的。
重现步骤: bq 查询 --batch --use_legacy_sql=False “选择 current_timestamp();” -- 这会在 INFORMATION_SCHEMA.JOBS 中生成一个条目:QUERY/SELECT/BATCH
bq 查询 --batch --use_legacy_sql=False "选择 current_timestamp();选择 current_timestamp();" -- 这会生成 3 个条目,父 SCRIPT 作业被分配为批处理,但两个子选择作业将获得 INTERACTIVE。 (见图)
即使您的查询被安排为
INTERACTIVE
优先级,也可以获得 BATCH
作业优先级。如果查询在 24 小时内尚未启动或排队,它将更改为交互优先级,这使得您的查询尽快执行。 BATCH 和 INTERACTIVE 查询使用相同的资源。
您可以到此链接参考。
/* Still a problem. In Google BigQuery, when multipart SQL statements that are initiated in BATCH mode, EVERY LINE of SQL is prioritized to INTERACTIVE.
Users are misled because the job_id for the script is listed as running in BATCH, but each command in the batch is charged against
the INTERACTIVE quota.
This billing "slight-of-hand" is not described in any of the documentation around BATCH/INTERACTIVE.
https://cloud.google.com/bigquery/docs/running-queries#batch
This error causes all SQL statements that meet the following conditions to be charged against INTERACTIVE quotas:
- All Multipart SQL statements
- Any SQL that declares a variable (thus is treated as a multipart SQL script)
- All Dynamic SQL
This overcharge occurs regardless of how the SQL is initiated, be that: Console, API or Scheduled Task
************************************
To run this code:
1) From the BigQuery Console / BigQuery Studio:
- ***Change the session to Batch***: By choosing: More-Query settings-Resource Management-Job priority-Batch
2) From the API in any language such as Python, C#, Go, etc.:
https://cloud.google.com/bigquery/docs/running-queries#batch
From API via the bq command-line utility:
3) bq query --use_legacy_sql=false --batch --format csv "SELECT 1+1;SELECT 2+2;SELECT user_email, start_time, parent_job_id,job_id, priority, statement_type, query FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_USER` WHERE start_time >= DATETIME_SUB(CURRENT_TIMESTAMP(),INTERVAL 3 SECOND) ORDER BY user_email, start_time;"
*/
SELECT 1+1;
SELECT 2+2;
SELECT user_email, start_time, parent_job_id,job_id, priority, statement_type, query
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
WHERE start_time >= DATETIME_SUB(CURRENT_TIMESTAMP(),INTERVAL 3 SECOND)
ORDER BY user_email, start_time;
/* The results of this final query should show that all statements were run as priority=BATCH.
But only the FIRST line (the submission/acceptance of the SQL script) will be executed as priority=BATCH, the actual SQL statements will all be run as priority=INTERACTIVE.*/