Bigquery,查找组的滚动最新值并找到组的最小值

问题描述 投票:0回答:1

假设我们在bigquery中有产品、卖家、价格和库存变化数据,如下所示。该数据来自产品列表表的更改(CDC)。

一种产品可能有多个卖家,并且不同的卖家对该产品有不同的价格。有时卖家会缺货,我们会在最低计算中忽略该卖家。因此当时该产品的最低价格取决于卖家的库存。


WITH a AS
(
  SELECT 1 AS dt, 'p' as product_id, 'A' AS seller, 10 as stock, 100 as price
  UNION ALL
  SELECT 2 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 120 as price
  UNION ALL
  SELECT 3 AS dt, 'p' as product_id, 'C' AS seller, 10 as stock, 150 as price
  UNION ALL
  SELECT 4 AS dt, 'p' as product_id, 'D' AS seller, 10 as stock, 300 as price
  UNION ALL
  SELECT 5 AS dt, 'p' as product_id, 'E' AS seller, 10 as stock, 400 as price
  UNION ALL
  SELECT 6 AS dt, 'p' as product_id, 'F' AS seller, 10 as stock, 500 as price
  UNION ALL
  SELECT 7 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 600 as price
  UNION ALL
  SELECT 8 AS dt, 'p' as product_id, 'A' AS seller, 0 as stock, 100 as price
  UNION ALL
  SELECT 9 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 110 as price
  UNION ALL
  SELECT 10 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 190 as price
  UNION ALL
  SELECT 11 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 800 as price
  UNION ALL
  SELECT 12 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 100 as price
)

SELECT *
FROM a

我想每次计算该产品的最低价格和卖家的最低价格:

所需输出:

dt 产品_id 最低价格 卖家最低价格
1 p 100 A
2 p 100 A
3 p 100 A
4 p 100 A
5 p 100 A
6 p 100 A
7 p 100 A
8 p 120 B
9 p 110 B
10 p 150 C
11 p 150 C
12 p 100 G

在时间 1,只有一个卖家销售该产品。所以最低价格是 100,最低价格的卖家是 A。在时间 2,出现了第二个卖家,但卖家 A 的最低价格仍然是 100。 直到时间7,状态都是一样的。

在时间8,卖家A缺货,所以该产品对卖家B的最低价格是120。在时间9,卖家B降价,所以对卖家B的最低价格是110。在时间10,B去缺货。在时间 11,G 提高了价格,因此没有效果。在时间 12,卖家 G 降低了价格。所以时间 12 的最低价格(对于有库存的卖家)是 100,卖家是 G。

简而言之,我想为有库存的卖家找到一个产品在不同时间的最低价格。

为了计算这个逻辑,我找到了一个包含交叉连接的解决方案,但它需要太长的时间和太多的资源。我想知道是否有更好的解决方案。我搜索了 stackoverflow + google,但找不到好的解决方案。

sql google-bigquery window-functions aggregation rolling-computation
1个回答
0
投票

这样的事情怎么样? (做得很快,所以可能也不是最有效的。)

创建所有不同时期和产品的列表(“distinct_periods”),然后加入拥有这些时期信息的卖家(“all_sellers”),并使用聚合仅筛选出每个卖家的最新信息。

然后最后一次聚合以找到您的分钟数(留下这些是 array_aggs,因为您可能需要整个历史记录 - 只需删除限制)。

WITH a AS
(
  SELECT 1 AS dt, 'p' as product_id, 'A' AS seller, 10 as stock, 100 as price
  UNION ALL
  SELECT 2 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 120 as price
  UNION ALL
  SELECT 3 AS dt, 'p' as product_id, 'C' AS seller, 10 as stock, 150 as price
  UNION ALL
  SELECT 4 AS dt, 'p' as product_id, 'D' AS seller, 10 as stock, 300 as price
  UNION ALL
  SELECT 5 AS dt, 'p' as product_id, 'E' AS seller, 10 as stock, 400 as price
  UNION ALL
  SELECT 6 AS dt, 'p' as product_id, 'F' AS seller, 10 as stock, 500 as price
  UNION ALL
  SELECT 7 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 600 as price
  UNION ALL
  SELECT 8 AS dt, 'p' as product_id, 'A' AS seller, 0 as stock, 100 as price
  UNION ALL
  SELECT 9 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 110 as price
  UNION ALL
  SELECT 10 AS dt, 'p' as product_id, 'B' AS seller, 10 as stock, 190 as price
  UNION ALL
  SELECT 11 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 800 as price
  UNION ALL
  SELECT 12 AS dt, 'p' as product_id, 'G' AS seller, 10 as stock, 100 as price
), distinct_periods as (
  select distinct dt, product_id
  from a
), all_sellers as (
  select distinct distinct_periods.dt, distinct_periods.product_id, 
    any_value(case when stock = 0 then null else price end) over(partition by distinct_periods.dt, distinct_periods.product_id, seller order by distinct_periods.dt desc) as instock_price,
    any_value(seller) over(partition by a.dt, distinct_periods.product_id, seller order by a.dt desc) as seller 
  from distinct_periods
  left join a
  on distinct_periods.product_id = a.product_id
  and distinct_periods.dt >= a.dt
)
select dt, product_id, array_agg(instock_price order by instock_price asc limit 1) as minimum_price, array_agg(seller order by instock_price asc limit 1) as seller_with_minimum_price
from all_sellers
where instock_price is not null
group by 1, 2
© www.soinside.com 2019 - 2024. All rights reserved.