将另一个重复记录内的重复记录更改为一组平面字段

问题描述 投票:0回答:2

我正在使用 Google Analytics 数据,并希望将 customDimensions 和 attempts.customDimensions 字段从具有索引值对转换为一组平面子字段。例如,如果我有类似的数据

hits.hitNumber hits.customDimension.index hits.customDimension.value
1 0 a
1 b
2 0 a
2 c
1 d

替换为

hits.hitNumber hits.cd_0 hits.cd_1 hits.cd_2
1 a b
2 a d c

同时保持所有其他数据相同。我目前在使用 UNNEST 时遇到了问题,我收到了重复的记录。

我目前陷入

(
   SELECT CASE
   WHEN hcd.index = 0 then hcd.value
   WHEN cd.index = 0 then cd.value
   ELSE NULL
   END AS temp
   FROM UNNEST(hits) h, UNNEST(h.customDimensions) AS hcd, 
   UNNEST(t.customDimensions) AS cd
   WHERE hcd.index=0 or cd.index=0
   LIMIT 1
) as cd_0

并且查询似乎只获取第一次命中的值。我想我必须以某种方式添加此 cd_0 字段作为点击记录数组中每个条目的子字段。可以吗?

sql google-bigquery google-analytics
2个回答
0
投票

下面是一个SQL Server解决方案:我认为你需要根据你的数据库/数据仓库进行修改。

with cte as
    (
        select 
            * ,
            row_number() over(order by (select null)) as seq 
        from customDimension 
    )
    select coalesce(b.[hist.hitNumber], a.maxHitNumber) as [hist.hitNumber] ,
    max(case when b.[hist.customDimensionIndex] = 0 then b.[hist.customDimensionValue] end) as [cd_0] , 
    max(case when b.[hist.customDimensionIndex] = 1 then b.[hist.customDimensionValue] end) as [cd_1] ,
    max(case when b.[hist.customDimensionIndex] = 2 then b.[hist.customDimensionValue] end) as [cd_2]
    from
    (
        select b.seq , max(a.[hist.hitNumber]) as maxHitNumber from
        (select * from cte where [hist.hitNumber] is not null) as a inner join cte as b on a.seq <= b.seq
        group by b.seq
    ) as a inner join cte as b on a.seq = b.seq
    group by coalesce(b.[hist.hitNumber], a.maxHitNumber);

0
投票

似乎最好的解决方案是 SELECT * REPLACE 函数和嵌套子查询的组合:

SELECT *
    REPLACE((
        SELECT AS STRUCT * 
        REPLACE((
            SELECT AS STRUCT
                (SELECT x.value FROM UNNEST(customDimensions) x WHERE x.index=0) cd_0,
                (SELECT x.value FROM UNNEST(customDimensions) x WHERE x.index=1) cd_1,
                (SELECT x.value FROM UNNEST(customDimensions) x WHERE x.index=2) cd_2
            ) AS customDimensions
        )
    FROM UNNEST(hits) h
    ) AS hits
)
FROM t

您还可以使用 ARRAY( 代替 REPLACE 之后的第二个 ( ) 将字段 hit.customDimensions 切换为 REPEATED。

为了表扬,我在这个答案中找到了 SELECT * REPLACE(( SELECT AS STRUCT )) 部分,以及嵌套子查询部分这里

© www.soinside.com 2019 - 2024. All rights reserved.