我有一张表,其中多列中包含单个值和逗号分隔值,如下表所示,
`No Records value1 value2 value3 value4 value5
--- ------- ------ ------ ------ ------ -------
1. A,B F,G D,E Noval Noval Z,U
2. Z X Noval P,Q S,T Noval`
所需的输出必须如下所示, `
No Records value1 value2 value3 value4 value5
--- ------- ------ ------ ------ ------ -------
1. A F D Noval Noval Z
1. B G E Noval Noval U
2. Z X Noval P S Noval
2. Z X Noval Q T Noval
` 使用了 unnest 但不知道,因为我是 bigquery 的新手
首先每列需要用逗号分隔。然后不安就开始发挥作用。如果对每列完成取消嵌套,则会创建所有组合。通过使用偏移量,可以消除不需要的组合。
WITH
tbl AS (
SELECT
SPLIT('A,B') AS Records,
SPLIT('F,G') AS value1)
SELECT
*
FROM
tbl,
UNNEST(Records) AS Records_name WITH OFFSET Records_id,
UNNEST(value1) AS value1_name WITH OFFSET value1_id
WHERE Records_id=value1_id
但是,如果存在不同长度的数组,这将会失败。因此,我们为数字 0 到 10.000 生成一个数组
tmp
。对于每个数字,我们获取数组的条目。如果数字大于数组大小,我们就取最后一个条目。
IF(tmp>=ARRAY_LENGTH(value1),value1[SAFE_OFFSET(ARRAY_LENGTH(value1)-1)], value1[SAFE_OFFSET(tmp)]) as value1,
这可以更好地写为
value1[SAFE_OFFSET(LEAST(ARRAY_LENGTH(value1)-1,tmp))] AS value1,
WHERE
条件会删除不需要的条目,其中tmp
大于最大数组大小。
WITH
tbl AS (
SELECT
1 AS row_num,
'A,B' AS Records,
'F,G' AS value1,
'D,E' AS value2,
'Noval' AS value3,
'Noval' AS value4,
'Z,U' AS value5
UNION ALL
SELECT
2,
'Z',
'X',
'Noval',
'P,Q',
'S,T',
'Noval' ),
tbl1 AS (
SELECT
row_num,
SPLIT(Records) AS Records,
SPLIT(value1) AS value1,
SPLIT(value2) AS value2,
SPLIT(value3) AS value3,
SPLIT(value4) AS value4,
SPLIT(value5) AS value5,
FROM
tbl ),
tbl2 AS (
SELECT
row_num,
tmp,
Records[SAFE_OFFSET(LEAST(ARRAY_LENGTH(Records)-1,tmp))] AS value1,
value1[SAFE_OFFSET(LEAST(ARRAY_LENGTH(value1)-1,tmp))] AS value1,
value2[SAFE_OFFSET(LEAST(ARRAY_LENGTH(value2)-1,tmp))] AS value2,
value3[SAFE_OFFSET(LEAST(ARRAY_LENGTH(value3)-1,tmp))] AS value3,
value4[SAFE_OFFSET(LEAST(ARRAY_LENGTH(value4)-1,tmp))] AS value4,
value5[SAFE_OFFSET(LEAST(ARRAY_LENGTH(value5)-1,tmp))] AS value5,
FROM
tbl1,
UNNEST(GENERATE_ARRAY(0,10000)) AS tmp
WHERE
tmp<GREATEST(ARRAY_LENGTH(value1),ARRAY_LENGTH(value2),ARRAY_LENGTH(value3),ARRAY_LENGTH(value4),ARRAY_LENGTH(value5)) )
SELECT
*
FROM
tbl2