即使父作业是 BATCH,BigQuery 子作业也会获得 INTERACTIVE 优先级

问题描述 投票:0回答:2

当使用 --batch 选项从 bq 命令行工具执行查询作业时,如果它是单个语句,它将获得 BATCH 优先级。但如果是一组语句,则为父 SCRIPT 作业分配 BATCH,但为各个语句分配 INTERACTIVE 优先级。对存储过程的调用也是如此。 优先级是从 information_schema.jobs 视图中观察到的。 Python API 也会发生相同的行为。

当父脚本作业以 BATCH 优先级运行时,子作业是否也应该获得 BATCH 优先级?我在文档中没有找到任何解释这一点的内容。也许这是有原因的。

重现步骤: bq 查询 --batch --use_legacy_sql=False “选择 current_timestamp();” -- 这会在 INFORMATION_SCHEMA.JOBS 中生成一个条目:QUERY/SELECT/BATCH

bq 查询 --batch --use_legacy_sql=False "选择 current_timestamp();选择 current_timestamp();" -- 这会生成 3 个条目,父 SCRIPT 作业被分配为批处理,但两个子选择作业将获得 INTERACTIVE。 (见图)

注意:没有 --batch 标志的行为,JOBS 中的所有三个条目都是交互式的:

google-cloud-platform google-bigquery
2个回答
1
投票

即使您的查询被安排为

INTERACTIVE
优先级,也可以获得
BATCH
作业优先级。如果查询在 24 小时内尚未启动或排队,它将更改为交互优先级,这使得您的查询尽快执行。 BATCH 和 INTERACTIVE 查询使用相同的资源。

您可以到此链接参考。


0
投票
/* Still a problem. In Google BigQuery, when multipart SQL statements that are initiated in BATCH mode, EVERY LINE of SQL is prioritized to INTERACTIVE.
   Users are misled because the job_id for the script is listed as running in BATCH, but each command in the batch is charged against
   the INTERACTIVE quota.
 
   This billing "slight-of-hand" is not described in any of the documentation around BATCH/INTERACTIVE.
   https://cloud.google.com/bigquery/docs/running-queries#batch
 
   This error causes all SQL statements that meet the following conditions to be charged against INTERACTIVE quotas:
   - All Multipart SQL statements
   - Any SQL that declares a variable (thus is treated as a multipart SQL script)
   - All Dynamic SQL
   This overcharge occurs regardless of how the SQL is initiated, be that: Console, API or Scheduled Task
 
   ************************************
   To run this code:
   1) From the BigQuery Console / BigQuery Studio:
   - ***Change the session to Batch***: By choosing: More-Query settings-Resource Management-Job priority-Batch
 
   2) From the API in any language such as Python, C#, Go, etc.:
   https://cloud.google.com/bigquery/docs/running-queries#batch
 
   From API via the bq command-line utility:
   3) bq query --use_legacy_sql=false --batch --format csv "SELECT 1+1;SELECT 2+2;SELECT user_email, start_time, parent_job_id,job_id, priority, statement_type, query FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_USER` WHERE start_time >= DATETIME_SUB(CURRENT_TIMESTAMP(),INTERVAL 3 SECOND) ORDER BY user_email, start_time;"
 
*/
SELECT 1+1;
SELECT 2+2;
 
SELECT user_email, start_time, parent_job_id,job_id, priority, statement_type, query
  FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER
  WHERE start_time >= DATETIME_SUB(CURRENT_TIMESTAMP(),INTERVAL 3 SECOND)
  ORDER BY user_email, start_time;
 
/* The results of this final query should show that all statements were run as priority=BATCH.
   But only the FIRST line (the submission/acceptance of the SQL script) will be executed as priority=BATCH, the actual SQL statements will all be run as priority=INTERACTIVE.*/
© www.soinside.com 2019 - 2024. All rights reserved.