我在 Big Query 中有以下格式的数据:
年_月 | 用户ID | 暗_1 | 暗_2 | 公制_1 | 公制_2 |
---|---|---|---|---|---|
2024-02 | a1 | 猫 | 桌面 | 1 | 34 |
2024-02 | a1 | 狗 | 手机 | 1 | 23 |
2024-02 | a1 | 狗 | 桌面 | 1 | 12 |
2024-02 | a1 | 鼠标 | 平板电脑 | 1 | 9 |
我正在处理的真实数据有数百万个 User_ID、24 个月的数据以及大量我不需要的维度和指标列。
我想按 Year_Month 和 User_ID 进行分组,然后按维度对指标进行求和,并将宽格式转换为长格式。还需要清理多个列以便将值分组在一起。例如,在示例数据中,“移动”和“平板电脑”应被视为“移动”。我正在使用的许多列都需要像这样清理。
目前我有类似的东西适用于 1 个指标,但我需要 2 个或更多指标:
create table test_wide as
(
select Year_Month,
User_ID,
sum(case when Dim_1 = 'cat' then Metric_1) end as Cat_m1,
sum(case when Dim_1 = 'dog' then Metric_1) end as Dog_m1,
sum(case when Dim_1 = 'mouse' then Metric_1) end as Mouse_m1,
sum(case when Dim_2 in ('mobile', 'tablet') then Metric_1) end as Mobile_m1,
sum(case when Dim_2 = 'desktop' then Metric_1) end as Desktop_m1
from data
group by 1,2
)
;
/* Convert from wide to long */
SELECT Year_Month, User_ID,
Dim,
SAFE_CAST(value AS int64) value
FROM (
SELECT Year_Month, User_ID,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(0)], r'^"|"$', '') Dim,
REGEXP_REPLACE(SPLIT(pair, ':')[OFFSET(1)], r'^"|"$', '') Value
FROM test_wide t,
UNNEST(SPLIT(REGEXP_REPLACE(to_json_string(t), r'{|}', ''))) pair
)
WHERE NOT LOWER(Dim) IN ('year_month', 'user_id')
and SAFE_CAST(value AS int64) > 0
我正在寻找的输出是这样的:
年_月 | 用户ID | 尺寸 | 公制_1 | 公制_2 |
---|---|---|---|---|
2024-02 | a1 | 猫 | 1 | 34 |
2024-02 | a1 | 狗 | 2 | 35 |
2024-02 | a1 | 鼠标 | 1 | 9 |
2024-02 | a1 | 桌面 | 2 | 46 |
2024-02 | a1 | 手机 | 2 | 32 |
数据需要采用这种长格式,然后才能输入仪表板工具。任何帮助深表感谢。 谢谢。
考虑下面的查询,使用 UNPIVOT 运算符:
WITH sample AS (
SELECT "2024-02" as year_month, "a1" as user_id, "cat" as dim_1, "desktop" as dim_2, 1 as metric_1, 34 as metric_2 UNION ALL
SELECT "2024-02" as year_month, "a1" as user_id, "dog" as dim_1, "mobile" as dim_2, 1 as metric_1, 23 as metric_2 UNION ALL
SELECT "2024-02" as year_month, "a1" as user_id, "dog" as dim_1, "desktop" as dim_2, 1 as metric_1, 12 as metric_2 UNION ALL
SELECT "2024-02" as year_month, "a1" as user_id, "mouse" as dim_1, "tablet" as dim_2, 1 as metric_1, 9 as metric_2
)
SELECT
year_month,
user_id,
dimension,
sum(metric_1) metric_1,
sum(metric_2) metric_2
FROM sample
UNPIVOT(dimension FOR original_dim IN (dim_1, dim_2))
GROUP BY year_month, user_id, dimension