如何计算序列中某列具有相同值的次数?

问题描述 投票:0回答:2

我正在 BigQuery 中寻找一种解决方案,可以在其中计算列在序列中具有相同值的次数。目前,我有这个表,我希望有一个新列来计算流列中的值在序列中出现的频率。

数据:

with t as (session_id,page_group,page_name,steps_flow) as values (
(1017911,Home Page,Home Page,1),
(1017911,Site Search,Site Search,2),
(1017911,Range - PIP,Wall shelves,3),
(1017911,Range - PIP,Wall shelves,4),
(1017911,Range - PIP,Wall shelves,5),
(1017911,Range - PIP,Wall shelves,6),
(1017911,Site Search,Site Search,7),
(1017911,Site Search,Site Search,8),
(1017911,Ideas,Ideas,9),
(1017911,Range - PLP,EKTORP series,10),
(1017911,Range - PIP,Sofas,11)
)

(新的 col f0 是预期输出):

current data

expected output

我找不到正确的查询。

google-bigquery count
2个回答
0
投票

这是一种间隙和孤岛问题,您可能会在下面考虑

SELECT * EXCEPT(gap, island), COUNT(1) OVER (PARTITION BY island) FROM (
  SELECT *, COUNTIF(gap) OVER w1 AS island FROM (
    SELECT *, page_name <> LAG(page_name) OVER w0 AS gap
      FROM sample_table
    WINDOW w0 AS (PARTITION BY session_id ORDER BY steps_flow)
  ) WINDOW w1 AS (PARTITION BY session_id ORDER BY steps_flow)
);

-- Query results

+------------+-------------+---------------+------------+-----+
| session_id | page_group  |   page_name   | steps_flow | f0_ |
+------------+-------------+---------------+------------+-----+
|    1017911 | Home Page   | Home Page     |          1 |   1 |
|    1017911 | Site Search | Site Search   |          2 |   1 |
|    1017911 | Range - PIP | Wall shelves  |          3 |   4 |
|    1017911 | Range - PIP | Wall shelves  |          4 |   4 |
|    1017911 | Range - PIP | Wall shelves  |          5 |   4 |
|    1017911 | Range - PIP | Wall shelves  |          6 |   4 |
|    1017911 | Site Search | Site Search   |          7 |   2 |
|    1017911 | Site Search | Site Search   |          8 |   2 |
|    1017911 | Ideas       | Ideas         |          9 |   1 |
|    1017911 | Range - PLP | EKTORP series |         10 |   1 |
|    1017911 | Range - PIP | Sofas         |         11 |   1 |
+------------+-------------+---------------+------------+-----+

您可以使用以下示例数据测试查询。

WITH sample_table AS (
  SELECT '1017911' session_id, 'Home Page' page_group, 'Home Page' page_name, 1 steps_flow  UNION ALL
  SELECT '1017911', 'Site Search', 'Site Search', 2 UNION ALL
  SELECT '1017911', 'Range - PIP', 'Wall shelves', 3 UNION ALL
  SELECT '1017911', 'Range - PIP', 'Wall shelves', 4 UNION ALL
  SELECT '1017911', 'Range - PIP', 'Wall shelves', 5 UNION ALL
  SELECT '1017911', 'Range - PIP', 'Wall shelves', 6 UNION ALL
  SELECT '1017911', 'Site Search', 'Site Search', 7 UNION ALL
  SELECT '1017911', 'Site Search', 'Site Search', 8 UNION ALL
  SELECT '1017911', 'Ideas', 'Ideas',9 UNION ALL
  SELECT '1017911', 'Range - PLP', 'EKTORP series',10 UNION ALL
  SELECT '1017911', 'Range - PIP', 'Sofas',11 
)

0
投票

您可以使用 COUNT 函数和 OVER(PARTION BY) 函数非常轻松地完成此操作:

SELECT session_id, page_content_group, page1, flow,
   COUNT(page_content_group) OVER(PARTITION BY page_content_group ) AS count_column
FROM your_table

(Ps.使用page1列你会得到同样的结果)

© www.soinside.com 2019 - 2024. All rights reserved.