我正在 BigQuery 中寻找一种解决方案,可以在其中计算列在序列中具有相同值的次数。目前,我有这个表,我希望有一个新列来计算流列中的值在序列中出现的频率。
数据:
with t as (session_id,page_group,page_name,steps_flow) as values (
(1017911,Home Page,Home Page,1),
(1017911,Site Search,Site Search,2),
(1017911,Range - PIP,Wall shelves,3),
(1017911,Range - PIP,Wall shelves,4),
(1017911,Range - PIP,Wall shelves,5),
(1017911,Range - PIP,Wall shelves,6),
(1017911,Site Search,Site Search,7),
(1017911,Site Search,Site Search,8),
(1017911,Ideas,Ideas,9),
(1017911,Range - PLP,EKTORP series,10),
(1017911,Range - PIP,Sofas,11)
)
(新的 col f0 是预期输出):
我找不到正确的查询。
这是一种间隙和孤岛问题,您可能会在下面考虑
SELECT * EXCEPT(gap, island), COUNT(1) OVER (PARTITION BY island) FROM (
SELECT *, COUNTIF(gap) OVER w1 AS island FROM (
SELECT *, page_name <> LAG(page_name) OVER w0 AS gap
FROM sample_table
WINDOW w0 AS (PARTITION BY session_id ORDER BY steps_flow)
) WINDOW w1 AS (PARTITION BY session_id ORDER BY steps_flow)
);
-- Query results
+------------+-------------+---------------+------------+-----+
| session_id | page_group | page_name | steps_flow | f0_ |
+------------+-------------+---------------+------------+-----+
| 1017911 | Home Page | Home Page | 1 | 1 |
| 1017911 | Site Search | Site Search | 2 | 1 |
| 1017911 | Range - PIP | Wall shelves | 3 | 4 |
| 1017911 | Range - PIP | Wall shelves | 4 | 4 |
| 1017911 | Range - PIP | Wall shelves | 5 | 4 |
| 1017911 | Range - PIP | Wall shelves | 6 | 4 |
| 1017911 | Site Search | Site Search | 7 | 2 |
| 1017911 | Site Search | Site Search | 8 | 2 |
| 1017911 | Ideas | Ideas | 9 | 1 |
| 1017911 | Range - PLP | EKTORP series | 10 | 1 |
| 1017911 | Range - PIP | Sofas | 11 | 1 |
+------------+-------------+---------------+------------+-----+
您可以使用以下示例数据测试查询。
WITH sample_table AS (
SELECT '1017911' session_id, 'Home Page' page_group, 'Home Page' page_name, 1 steps_flow UNION ALL
SELECT '1017911', 'Site Search', 'Site Search', 2 UNION ALL
SELECT '1017911', 'Range - PIP', 'Wall shelves', 3 UNION ALL
SELECT '1017911', 'Range - PIP', 'Wall shelves', 4 UNION ALL
SELECT '1017911', 'Range - PIP', 'Wall shelves', 5 UNION ALL
SELECT '1017911', 'Range - PIP', 'Wall shelves', 6 UNION ALL
SELECT '1017911', 'Site Search', 'Site Search', 7 UNION ALL
SELECT '1017911', 'Site Search', 'Site Search', 8 UNION ALL
SELECT '1017911', 'Ideas', 'Ideas',9 UNION ALL
SELECT '1017911', 'Range - PLP', 'EKTORP series',10 UNION ALL
SELECT '1017911', 'Range - PIP', 'Sofas',11
)
您可以使用 COUNT 函数和 OVER(PARTION BY) 函数非常轻松地完成此操作:
SELECT session_id, page_content_group, page1, flow,
COUNT(page_content_group) OVER(PARTITION BY page_content_group ) AS count_column
FROM your_table
(Ps.使用page1列你会得到同样的结果)