使用total_slots_ms计算的Bigquery估计成本与bigquery预订API计费成本有很大不同

问题描述 投票:0回答:1

我使用此查询来查找使用 slot 的总成本。此处计算的total_cost 与我在 BigQuery 预订 API 中看到的完全不同。我有标准版。我是不是错过了什么?

我还估计了存储成本,但这根本不包括差额。如有任何帮助,我们将不胜感激。

        SELECT (SUM(TOTAL_SLOT_MS)*0.04)/(1000*60*60) AS TOTAL_COST
        , MAX(jobstage_max_slots) AS MAX_SLOTS
        , AVG(job_avg_slots) AS AVG_SLOTS
        FROM
        (
        SELECT
        project_id,
        job_id,
        reservation_id,
        EXTRACT(DATE FROM creation_time) AS creation_date,
        TIMESTAMP_DIFF(end_time, start_time, SECOND) AS job_duration_seconds,
        job_type,
        user_email,
        total_bytes_billed,

        -- Average slot utilization per job  

        SAFE_DIVIDE(job.total_slot_ms,(TIMESTAMP_DIFF(job.end_time, job.start_time, MILLISECOND))) AS  job_avg_slots,
        query,

        -- Determine the max number of slots used at ANY stage in the query.
        -- The average slots might be 55. But a single stage might spike to 2000 slots.
        -- This is important to know when estimating number of slots to purchase.
        job.total_slot_ms,

        MAX(SAFE_DIVIDE(unnest_job_stages.slot_ms,unnest_job_stages.end_ms - unnest_job_stages.start_ms)) AS jobstage_max_slots,

        -- Check if there's a job that requests more units of works (slots). If so you need more slots.
        -- estimated_runnable_units = Units of work that can be scheduled immediately.
        -- Providing additional slots for these units of work accelerates the query,
        -- if no other query in the reservation needs additional slots.

        MAX(unnest_timeline.estimated_runnable_units) AS estimated_runnable_units
        FROM `region-us`.INFORMATION_SCHEMA.JOBS AS job
        CROSS JOIN UNNEST(job_stages) as unnest_job_stages
        CROSS JOIN UNNEST(timeline) AS unnest_timeline
        WHERE project_id = 'open-bridge-bg'
        -- and job_type = 'QUERY'
        --  AND statement_type != 'SCRIPT'
        AND DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE()
        GROUP BY 1,2,3,4,5,6,7,8,9,10,11
        ORDER BY job_id
        );


https://cloud.google.com/bigquery/docs/information-schema-jobs#calculate_average_slot_utilization

我预计总成本与我在账单中看到的大致相同

google-bigquery billing slots
1个回答
0
投票

造成差异的一个原因可能是因为 BigQuery 基于槽位的定价以 100 个槽位为增量向上/向下扩展,因此,即使

total_slot_ms
值可能指示在查询持续时间内平均少于 100 个槽位,自动缩放器仍将扩展到 100。我已经能够通过将插槽使用量四舍五入 100 来获得更接近的成本估算。以下是如何查询的示例:

DECLARE standard_edition_cost_per_slot_hour FLOAT64 DEFAULT 0.04;
DECLARE ms_per_hour INT64 DEFAULT 1000 * 3600;

WITH raw_data AS (
  SELECT 
    job_id,
    total_slot_ms,
    TIMESTAMP_DIFF(end_time, start_time, millisecond) as job_duration_ms
  FROM `region-us`.INFORMATION_SCHEMA.JOBS
  WHERE
    creation_time BETWEEN
      TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY)
      AND TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
    AND cache_hit != true
    AND total_slot_ms IS NOT NULL
),

job_w_avg_slot_usage AS (
  SELECT
    *,
    total_slot_ms / job_duration_ms AS avg_slot_usage
  FROM raw_data
),

job_w_avg_slot_usage_100 AS (
  SELECT
    *,
    CEIL(avg_slot_usage / 100) * 100 AS avg_slot_usage_100
  FROM job_w_avg_slot_usage
),

job_w_slot_hr_100 AS (
  SELECT
    *,
    job_duration_ms * avg_slot_usage_100 / ms_per_hour AS slot_hr_100
  FROM job_w_avg_slot_usage_100
),

job_w_cost AS (
  SELECT
    *,
    slot_hr_100 * standard_edition_cost_per_slot_hour AS standard_edition_cost
  FROM job_w_slot_hr_100
)

SELECT SUM(standard_edition_cost) as total_cost FROM job_w_cost
© www.soinside.com 2019 - 2024. All rights reserved.