在行和中位数内的bigquery中进行计算 - 这可能吗?

问题描述 投票:1回答:1

我的问题是得到一套原始的感官数据,在我使用之前需要一些处理。将数据加载到客户端并进行处理非常慢,因此寻找将此逻辑卸载到bigquery的可能性。

想象一下,我有一些传感器常数。他们可以改变,但是当我想进行查询时我会拥有它们

A: 1, B: 2, C: 3, D: 2, E: 1, F: 2

传感器已连接,我知道哪些传感器相互连接。它有一个含义如下。

A: BC
D: EF

这是一个表,每个传感器的每个时间戳都有测量值。想象成千上万行。

TS    A  |  B  |  C  |  D  |  E  |  F  
01    10 |  20 |  10 |  10 |  15 | 10
02    11 |  10 |  20 |  20 |  10 | 10
03    12 |  20 |  10 |  10 |  12 | 11
04    13 |  10 |  10 |  20 |  15 | 15
05    11 |  20 |  10 |  15 |  14 | 14
06    10 |  20 |  10 |  10 |  15 | 12

我想查询ts 01到ts 06(实际上它可以再次是1000行)。我不希望它返回这个原始数据,但让它做一些计算:

首先,对于每一行,我需要减少常量,因此第01行看起来像:

01    9 |  18 |  17 |  8 |  14 | 8

然后,BC需要减去A,并且EF要减少D:

01    9 |   9 |   8 |  8 |   6 | 0

最后一步,当我有所有行时,我想返回行,其中每个传感器具有该传感器的过程X行的中值。所以

    TS    A  |  B  | 
    01    10 |  1  | 
    02    11 |  2  |  
    03    12 |  2  |  
    04    13 |  1  |  
    05    11 |  2  | 
    06    10 |  3  |  
    07    10 |  4  | 
    08    11 |  2  |  
    09    12 |  2  |  
    10    13 |  10 |  
    11    11 |  20 | 
    12    10 |  20 |  

返回(对于X是4)

    TS    A  |  B  | 
   //first 3 needed for median for 4th value
    04    11.5 |  etc  |   //median 10, 11, 12, 13
    05    11.5 |  etc  |   //median 11, 12, 13, 11
    06    11.5 |  etc  |   //median 12, 13, 11, 10
    07    etc |  etc | 
    08    etc |  etc |  
    09    etc |  etc |  
    10    etc |  etc |  
    11    etc |  etc | 
    12    etc |  etc |  

将数据发送到我的服务器并进行计算是非常慢的,我真的很想知道我们是否可以在bigQuery中获取这些数据量,所以我能够使用我自己选择的设置获得快速计算的集合!

我在Node.js中这样做...但是在BigQuery SQL中..我迷路了。

google-bigquery calculated-columns median
1个回答
1
投票

以下是BigQuery Standard SQL

如果您要查找AVG值 - 这将是“简单”,如下所示

#standardSQL
WITH constants AS (
  SELECT 1 val_a, 2 val_b, 3 val_c, 2 val_d, 1 val_e, 2 val_f
), temp AS (
  SELECT ts,
    a - val_a AS a, 
    b - val_b - a + val_a AS b,
    c - val_c - a + val_a AS c,
    d - val_d AS d,
    e - val_e - d + val_d  AS e,
    f - val_f - d + val_d AS f
  FROM `project.dataset.measurements`, constants
)
SELECT ts, 
  AVG(a) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) a,  
  AVG(b) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) b,  
  AVG(c) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) c,  
  AVG(d) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) d,  
  AVG(e) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) e,  
  AVG(f) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) f  
FROM temp

对于MEDIAN,您需要添加一些额外内容 - 如下例所示

#standardSQL
WITH constants AS (
  SELECT 1 val_a, 2 val_b, 3 val_c, 2 val_d, 1 val_e, 2 val_f
), temp AS (
  SELECT ts,
    a - val_a AS a, 
    b - val_b - a + val_a AS b,
    c - val_c - a + val_a AS c,
    d - val_d AS d,
    e - val_e - d + val_d  AS e,
    f - val_f - d + val_d AS f
  FROM `project.dataset.measurements`, constants
)
SELECT ts,
  (SELECT PERCENTILE_CONT(a, 0.5) OVER() FROM UNNEST(a) a LIMIT 1) a,
  (SELECT PERCENTILE_CONT(b, 0.5) OVER() FROM UNNEST(b) b LIMIT 1) b,
  (SELECT PERCENTILE_CONT(c, 0.5) OVER() FROM UNNEST(c) c LIMIT 1) c,
  (SELECT PERCENTILE_CONT(d, 0.5) OVER() FROM UNNEST(d) d LIMIT 1) d,
  (SELECT PERCENTILE_CONT(e, 0.5) OVER() FROM UNNEST(e) e LIMIT 1) e,
  (SELECT PERCENTILE_CONT(f, 0.5) OVER() FROM UNNEST(f) f LIMIT 1) f
FROM (
  SELECT ts, 
    ARRAY_AGG(a) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) a,  
    ARRAY_AGG(b) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) b,  
    ARRAY_AGG(c) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) c,  
    ARRAY_AGG(d) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) d,  
    ARRAY_AGG(e) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) e,  
    ARRAY_AGG(f) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) f  
  FROM temp
)

您可以使用问题中的示例数据进行测试,使用上面的示例,如下例所示

#standardSQL
WITH `project.dataset.measurements` AS (
  SELECT 01 ts, 10 a, 20 b, 20 c, 10 d, 15 e, 10 f UNION ALL
  SELECT 02, 11, 10, 20, 20, 10, 10 UNION ALL
  SELECT 03, 12, 20, 10, 10, 12, 11 UNION ALL
  SELECT 04, 13, 10, 10, 20, 15, 15 UNION ALL
  SELECT 05, 11, 20, 10, 15, 14, 14 UNION ALL
  SELECT 06, 10, 20, 10, 10, 15, 12 
), constants AS (
  SELECT 1 val_a, 2 val_b, 3 val_c, 2 val_d, 1 val_e, 2 val_f
), temp AS (
  SELECT ts,
    a - val_a AS a, 
    b - val_b - a + val_a AS b,
    c - val_c - a + val_a AS c,
    d - val_d AS d,
    e - val_e - d + val_d  AS e,
    f - val_f - d + val_d AS f
  FROM `project.dataset.measurements`, constants
)
SELECT ts,
  (SELECT PERCENTILE_CONT(a, 0.5) OVER() FROM UNNEST(a) a LIMIT 1) a,
  (SELECT PERCENTILE_CONT(b, 0.5) OVER() FROM UNNEST(b) b LIMIT 1) b,
  (SELECT PERCENTILE_CONT(c, 0.5) OVER() FROM UNNEST(c) c LIMIT 1) c,
  (SELECT PERCENTILE_CONT(d, 0.5) OVER() FROM UNNEST(d) d LIMIT 1) d,
  (SELECT PERCENTILE_CONT(e, 0.5) OVER() FROM UNNEST(e) e LIMIT 1) e,
  (SELECT PERCENTILE_CONT(f, 0.5) OVER() FROM UNNEST(f) f LIMIT 1) f
FROM (
  SELECT ts, 
    ARRAY_AGG(a) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) a,  
    ARRAY_AGG(b) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) b,  
    ARRAY_AGG(c) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) c,  
    ARRAY_AGG(d) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) d,  
    ARRAY_AGG(e) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) e,  
    ARRAY_AGG(f) OVER(ORDER BY ts ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) f  
  FROM temp
)
-- ORDER BY ts   

结果

Row ts  a       b       c       d       e       f    
1   1   null    null    null    null    null    null     
2   2   9.0     9.0     8.0     8.0     6.0     0.0  
3   3   9.5     3.5     7.5     13.0    -1.5    -5.0     
4   4   10.0    7.0     7.0     8.0     3.0     0.0  
5   5   10.5    2.5     1.5     13.0    -0.5    -2.5     
6   6   10.5    2.5     -3.5    15.5    -2.0    -3.0       
© www.soinside.com 2019 - 2024. All rights reserved.