我正在使用 Google Analytics 数据,并希望将 customDimensions 和 attempts.customDimensions 字段从具有索引值对转换为一组平面子字段。例如,如果我有类似的数据
hits.hitNumber | hits.customDimension.index | hits.customDimension.value |
---|---|---|
1 | 0 | a |
1 | b | |
2 | 0 | a |
2 | c | |
1 | d |
替换为
hits.hitNumber | hits.cd_0 | hits.cd_1 | hits.cd_2 |
---|---|---|---|
1 | a | b | |
2 | a | d | c |
同时保持所有其他数据相同。我目前在使用 UNNEST 时遇到了问题,我收到了重复的记录。
我目前陷入
(
SELECT CASE
WHEN hcd.index = 0 then hcd.value
WHEN cd.index = 0 then cd.value
ELSE NULL
END AS temp
FROM UNNEST(hits) h, UNNEST(h.customDimensions) AS hcd,
UNNEST(t.customDimensions) AS cd
WHERE hcd.index=0 or cd.index=0
LIMIT 1
) as cd_0
并且查询似乎只获取第一次命中的值。我想我必须以某种方式添加此 cd_0 字段作为点击记录数组中每个条目的子字段。可以吗?
下面是一个SQL Server解决方案:我认为你需要根据你的数据库/数据仓库进行修改。
with cte as
(
select
* ,
row_number() over(order by (select null)) as seq
from customDimension
)
select coalesce(b.[hist.hitNumber], a.maxHitNumber) as [hist.hitNumber] ,
max(case when b.[hist.customDimensionIndex] = 0 then b.[hist.customDimensionValue] end) as [cd_0] ,
max(case when b.[hist.customDimensionIndex] = 1 then b.[hist.customDimensionValue] end) as [cd_1] ,
max(case when b.[hist.customDimensionIndex] = 2 then b.[hist.customDimensionValue] end) as [cd_2]
from
(
select b.seq , max(a.[hist.hitNumber]) as maxHitNumber from
(select * from cte where [hist.hitNumber] is not null) as a inner join cte as b on a.seq <= b.seq
group by b.seq
) as a inner join cte as b on a.seq = b.seq
group by coalesce(b.[hist.hitNumber], a.maxHitNumber);
似乎最好的解决方案是 SELECT * REPLACE 函数和嵌套子查询的组合:
SELECT *
REPLACE((
SELECT AS STRUCT *
REPLACE((
SELECT AS STRUCT
(SELECT x.value FROM UNNEST(customDimensions) x WHERE x.index=0) cd_0,
(SELECT x.value FROM UNNEST(customDimensions) x WHERE x.index=1) cd_1,
(SELECT x.value FROM UNNEST(customDimensions) x WHERE x.index=2) cd_2
) AS customDimensions
)
FROM UNNEST(hits) h
) AS hits
)
FROM t
您还可以使用 ARRAY( 代替 REPLACE 之后的第二个 ( ) 将字段 hit.customDimensions 切换为 REPEATED。
为了表扬,我在