BigQuery(Google Analytics(分析)数据:在同一个'hits.hitNumber'中查询两个不同的'hits.customDimensions.index'

问题描述 投票:0回答:1

我的目标:

如果以下两个hits.customDimensions.index和关联的hits.customDimensions.value出现在相同的hits.hitNumber中,则为该会话计数1(如果仍嵌套主查询,则每行为1个会话:]

[['hits.customDimensions.index'= 43和相关的'hits.customDimensions.value'IN('login','payment','order','thankyou')] AND ['hits.customDimensions.index'= 10,且具有关联的'hits.customDimensions.value'='checkout'[在相同hits.hitNumber]

我的问题

我不知道如何在不具有不同WITH表的子查询中的同一hits.hitNumber中查询两个不同的hits.customDimensions.value。如果可以的话,我敢肯定,查询将非常容易且简短。由于我不知道如何在子查询中查询此用例,因此我使用了总计为5个WITH表的解决方法。 我希望能有一种查询此用例的简单方法

说明解决方法查询:

表1:查询除“问题度量”外的所有内容>

表2-3:每个表查询一个具有匹配的hits.customDimensions.value的hits.customDimensions.index,并为其过滤正确的值,sessionId和hitNumber

table4:根据日期,sessionID和hitNumber将表2与表3左连接。基本上,如果hitNumber与来自table2和table3的sessionId组合,我算1

表5:将表1与表4左连接以合并数据

#Table1 - complete data except session_atleast_loginCheckout
WITH
  prepared_data AS (
  SELECT
    date,
    SUM((SELECT 1 FROM UNNEST(hits) WHERE CAST(eCommerceAction.action_type AS INT64) BETWEEN 4 AND 6 LIMIT 1)) AS sessions_atleast_basket, 
    #insert in this row query for sessions_atleast_loginCheckout
    SUM((SELECT 1 FROM UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd WHERE index = 43 AND value IN ('payment', 'order', 'thankyou') LIMIT 1)) AS sessions_atleast_payment,
  FROM
    `big-query-221916.172008714.ga_sessions_*`
  WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND totals.visits = 1 
  GROUP BY
    date


#Table2 - data for hits.customDimensions.index = 10 AND associated hits.customDimensions.value = 'checkout' with hits.hitNumber and sessionId (join later based on hitNumber and sessionId)
loginCheckout_index10_pagetype_data AS (
  SELECT
    date AS date,
    CONCAT(fullVisitorId, '/', CAST( visitStartTime AS STRING)) AS sessionId,
    h.hitNumber AS hitNumber,
    IF(hcd.value IS NOT NULL, 1, NULL) AS pagetype_checkout
  FROM
    `big-query-221916.172008714.ga_sessions_*` AS o, UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd
  WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND hcd.index = 10 AND VALUE = 'checkout'  AND h.type = 'PAGE' AND totals.visits = 1),


#Table3 - data for hits.customDimensions.index = 43 AND associated hits.customDimensions.value IN ('login', 'register', 'payment', 'order','thankyou') with hits.hitNumber and sessionId (join later based on hitNumber and sessionId)
loginCheckout_index43_pagelevel1_data AS (
  SELECT
    date AS date,
    CONCAT(fullVisitorId, '/', CAST( visitStartTime AS STRING)) AS sessionId,
    h.hitNumber AS hitNumber,
    IF(hcd.value IS NOT NULL, 1, NULL) AS pagelevel1_login_to_thankyou
  FROM
    `big-query-221916.172008714.ga_sessions_*` AS o, UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd
  WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND hcd.index = 43 AND VALUE IN ('login', 'register', 'payment', 'order', 'thankyou') AND h.type = 'PAGE'  
),


#table4 - left join table2 and table 3 on sessionId and hitNumber to get sessions_atleast_loginCheckout
loginChackout_output_data AS(
  SELECT
    a.date AS date,
    COUNT(DISTINCT a.sessionId) AS sessions_atleast_loginCheckout 
  FROM
    loginCheckout_index10_pagetype_data AS a
  LEFT JOIN 
    loginCheckout_index43_pagelevel1_data AS b 
  ON
    a.date = b.date AND
    a.sessionId = b.sessionId AND
    a.hitNumber = b.hitNumber
  WHERE
    pagelevel1_login_to_thankyou IS NOT NULL
  GROUP BY
    date



#table5 - leftjoin table1 with table4 to get all data together
SELECT
  prep.date,
  prep.sessions_atleast_basket,
  log.sessions_atleast_loginCheckout,
  prep.sessions_atleast_payment
FROM
    prepared_data AS prep
  LEFT JOIN
    loginChackout_output_data as log
  ON
    prep.date = log.date AND


我的目标:如果以下两个hits.customDimensions.index和关联的hits.customDimensions.value出现在相同的hits.hitNumber中,则为该会话计数1(如果主查询为...,则每行为1个会话)

sql google-analytics google-bigquery
1个回答
0
投票

有点像盗梦空间,但也许有助于记住unnest()的输入是一个数组,而输出是表行...

© www.soinside.com 2019 - 2024. All rights reserved.