考虑一个 BigQuery 视图,它有一列
ARRAY<STRUCT<col1 FLOAT, col2 FLOAT>>
称为 X.
这是检索那些带有附加列的行的最佳方法,该列是对“X”数组中那些元素的计算。是否可以通过大查询存储过程来实现?
有这样的东西会很棒:
select computation(X), * from something
computation(X)
会遍历 X 数组的各个元素,并用一些额外的规则对它们求和。
关于这一点,BigQuery 函数似乎不处理结构/数组,而只处理标量类型,因此由于数据类型为 ARRAY其中一个要点是保持此查询在 BigQuery 查询控制台中可用,避免使用外部脚本(如 python)之类的解决方案。
是否有内部有选择并返回附加信息的程序示例?就像查询上的地图函数一样。一种类似于前面的函数计算 (X) 示例的过滤器。
根据要求,为了提供更多上下文,我有一个专栏是:
ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64, usd_amount FLOAT64, account_currency_amount FLOAT64>>
它包含 GCP 价格等级。我必须遍历所有这些并计算最终价格。
FLOAT64
是货币可靠数据类型的占位符。我仍在 BigQuery 上寻找它。
我会达到类似的目的,实现一个名为
get_tiers_total_expense
的功能
-- Ex. Those are 2 tiers.
-- Tier 1: It starts from 0 usages and goes until 20. The costs 10 for every unit used.
-- Tier 2: It starts from 20 usages and cost is 5
select get_tiers_total_expense(array(
select as struct 1.0, 0.0, 10.0, 9.0 union all
select as struct 1.0, 20.0, 5.0, 4.0 as tiered_rates));
我最终听从了@jaytiger 的建议,因此我创建了一个 BigQuery UDF。
我必须遍历所有 gcp 层来计算最终成本成本。我使用
FLOAT
作为货币的占位符。我当然需要弄清楚哪种数据类型更适合将它们存储在 BigQuery 中……
这里是部分解决方案:
CREATE OR REPLACE FUNCTION get_tiers_total_expense(
tiers ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64, usd_amount FLOAT64, account_currency_amount FLOAT64>>
)
RETURNS FLOAT
LANGUAGE js AS """
// it takes tiers and return the final cost
""";
-- Ex. Those are 2 tiers.
-- Tier 1: It starts from 0 usages and goes until 20. The costs 10 for every unit used.
-- Tier 2: It starts from 20 usages and cost is 5
select get_tiers_total_expense(array(
select as struct 1.0, 0.0, 10.0, 9.0 union all
select as struct 1.0, 20.0, 5.0, 4.0 as tiered_rates));
你可以考虑以下。 (我建议 SQL UDF 而不是 JS UDF。)
-- sample data
CREATE TEMP TABLE tiers AS
SELECT 1.0 pricing_unit_quantity, 0.0 start_usage_amount,
10.0 usd_amount, 9.0 account_currency_amount UNION ALL
SELECT 1.0, 20.0, 5.0, 4.0;
-- define UDFs here
CREATE TEMP FUNCTION get_tiers_total_expense (
tiers ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64,
usd_amount FLOAT64,
account_currency_amount FLOAT64>>
) AS ((
-- it takes tiers and return the final cost
-- -> you can adjust the aggregation logic
SELECT SUM(pricing_unit_quantity * usd_amount * account_currency_amount)
FROM UNNEST(tiers) tier
));
-- sample query here using UDFs and sample data
SELECT get_tiers_total_expense(
ARRAY_AGG(
STRUCT(
pricing_unit_quantity,
start_usage_amount,
usd_amount,
account_currency_amount
)
)
) AS get_tiers_total_expense
FROM tiers t
--GROUP BY -- later you can change group-by columns depending on your use cases.
-- query result
+-------------------------+
| get_tiers_total_expense |
+-------------------------+
| 110.0 |
+-------------------------+