按维度分组并使用多个指标从宽到长收集

问题描述 投票:0回答:1

我在 Big Query 中有以下格式的数据:

年_月 用户ID 暗_1 暗_2 公制_1 公制_2
2024-02 a1 桌面 1 34
2024-02 a1 手机 1 23
2024-02 a1 桌面 1 12
2024-02 a1 鼠标 平板电脑 1 9

我正在处理的真实数据有数百万个 User_ID、24 个月的数据以及大量我不需要的维度和指标列。

我想按 Year_Month 和 User_ID 进行分组,然后按维度对指标进行求和,并将宽格式转换为长格式。还需要清理多个列以便将值分组在一起。例如,在示例数据中,“移动”和“平板电脑”应被视为“移动”。我正在使用的许多列都需要像这样清理。

目前我有类似的东西适用于 1 个指标,但我需要 2 个或更多指标:

create table test_wide as 
(
select Year_Month,
User_ID,
sum(case when Dim_1 = 'cat' then Metric_1) end as Cat_m1,
sum(case when Dim_1 = 'dog' then Metric_1) end as Dog_m1,
sum(case when Dim_1 = 'mouse' then Metric_1) end as Mouse_m1,
sum(case when Dim_2 in ('mobile', 'tablet') then Metric_1) end as Mobile_m1,
sum(case when Dim_2  = 'desktop' then Metric_1) end as Desktop_m1
from data
group by 1,2
) 
;

/* Convert from wide to long */

SELECT Year_Month, User_ID,
    Dim, 
    SAFE_CAST(value AS int64) value
FROM (
    SELECT Year_Month, User_ID, 
      REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') Dim, 
      REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') Value 
    FROM test_wide t, 
    UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(t), r'{|}', ''))) pair
  )
  WHERE NOT LOWER(Dim) IN ('year_month', 'user_id')
  and SAFE_CAST(value AS int64) > 0

我正在寻找的输出是这样的:

年_月 用户ID 尺寸 公制_1 公制_2
2024-02 a1 1 34
2024-02 a1 2 35
2024-02 a1 鼠标 1 9
2024-02 a1 桌面 2 46
2024-02 a1 手机 2 32

数据需要采用这种长格式,然后才能输入仪表板工具。任何帮助深表感谢。 谢谢。

sql google-bigquery
1个回答
0
投票

考虑下面的查询,使用 UNPIVOT 运算符:

WITH sample AS (
  SELECT "2024-02" as year_month, "a1" as user_id, "cat" as dim_1, "desktop" as dim_2, 1 as metric_1, 34 as metric_2 UNION ALL
  SELECT "2024-02" as year_month, "a1" as user_id, "dog" as dim_1, "mobile" as dim_2, 1 as metric_1, 23 as metric_2 UNION ALL
  SELECT "2024-02" as year_month, "a1" as user_id, "dog" as dim_1, "desktop" as dim_2, 1 as metric_1, 12 as metric_2 UNION ALL
  SELECT "2024-02" as year_month, "a1" as user_id, "mouse" as dim_1, "tablet" as dim_2, 1 as metric_1, 9 as metric_2
)
SELECT
  year_month,
  user_id,
  dimension,
  sum(metric_1) metric_1,
  sum(metric_2) metric_2
FROM sample
UNPIVOT(dimension FOR original_dim IN (dim_1, dim_2))
GROUP BY year_month, user_id, dimension

输出:

© www.soinside.com 2019 - 2024. All rights reserved.