我的目标:
如果以下两个hits.customDimensions.index和关联的hits.customDimensions.value出现在相同的hits.hitNumber中,则为该会话计数1(如果仍嵌套主查询,则每行为1个会话:]
[['hits.customDimensions.index'= 43和相关的'hits.customDimensions.value'IN('login','payment','order','thankyou')] AND ['hits.customDimensions.index'= 10,且具有关联的'hits.customDimensions.value'='checkout'[在相同hits.hitNumber]
我的问题:
我不知道如何在不具有不同WITH表的子查询中的同一hits.hitNumber中查询两个不同的hits.customDimensions.value。如果可以的话,我敢肯定,查询将非常容易且简短。由于我不知道如何在子查询中查询此用例,因此我使用了总计为5个WITH表的解决方法。 我希望能有一种查询此用例的简单方法
说明解决方法查询:
表1:查询除“问题度量”外的所有内容>
表2-3:每个表查询一个具有匹配的hits.customDimensions.value的hits.customDimensions.index,并为其过滤正确的值,sessionId和hitNumber
table4:根据日期,sessionID和hitNumber将表2与表3左连接。基本上,如果hitNumber与来自table2和table3的sessionId组合,我算1
表5:将表1与表4左连接以合并数据
#Table1 - complete data except session_atleast_loginCheckout
WITH
prepared_data AS (
SELECT
date,
SUM((SELECT 1 FROM UNNEST(hits) WHERE CAST(eCommerceAction.action_type AS INT64) BETWEEN 4 AND 6 LIMIT 1)) AS sessions_atleast_basket,
#insert in this row query for sessions_atleast_loginCheckout
SUM((SELECT 1 FROM UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd WHERE index = 43 AND value IN ('payment', 'order', 'thankyou') LIMIT 1)) AS sessions_atleast_payment,
FROM
`big-query-221916.172008714.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND totals.visits = 1
GROUP BY
date
#Table2 - data for hits.customDimensions.index = 10 AND associated hits.customDimensions.value = 'checkout' with hits.hitNumber and sessionId (join later based on hitNumber and sessionId)
loginCheckout_index10_pagetype_data AS (
SELECT
date AS date,
CONCAT(fullVisitorId, '/', CAST( visitStartTime AS STRING)) AS sessionId,
h.hitNumber AS hitNumber,
IF(hcd.value IS NOT NULL, 1, NULL) AS pagetype_checkout
FROM
`big-query-221916.172008714.ga_sessions_*` AS o, UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND hcd.index = 10 AND VALUE = 'checkout' AND h.type = 'PAGE' AND totals.visits = 1),
#Table3 - data for hits.customDimensions.index = 43 AND associated hits.customDimensions.value IN ('login', 'register', 'payment', 'order','thankyou') with hits.hitNumber and sessionId (join later based on hitNumber and sessionId)
loginCheckout_index43_pagelevel1_data AS (
SELECT
date AS date,
CONCAT(fullVisitorId, '/', CAST( visitStartTime AS STRING)) AS sessionId,
h.hitNumber AS hitNumber,
IF(hcd.value IS NOT NULL, 1, NULL) AS pagelevel1_login_to_thankyou
FROM
`big-query-221916.172008714.ga_sessions_*` AS o, UNNEST(hits) as h, UNNEST(h.customDimensions) as hcd
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)) AND hcd.index = 43 AND VALUE IN ('login', 'register', 'payment', 'order', 'thankyou') AND h.type = 'PAGE'
),
#table4 - left join table2 and table 3 on sessionId and hitNumber to get sessions_atleast_loginCheckout
loginChackout_output_data AS(
SELECT
a.date AS date,
COUNT(DISTINCT a.sessionId) AS sessions_atleast_loginCheckout
FROM
loginCheckout_index10_pagetype_data AS a
LEFT JOIN
loginCheckout_index43_pagelevel1_data AS b
ON
a.date = b.date AND
a.sessionId = b.sessionId AND
a.hitNumber = b.hitNumber
WHERE
pagelevel1_login_to_thankyou IS NOT NULL
GROUP BY
date
#table5 - leftjoin table1 with table4 to get all data together
SELECT
prep.date,
prep.sessions_atleast_basket,
log.sessions_atleast_loginCheckout,
prep.sessions_atleast_payment
FROM
prepared_data AS prep
LEFT JOIN
loginChackout_output_data as log
ON
prep.date = log.date AND
我的目标:如果以下两个hits.customDimensions.index和关联的hits.customDimensions.value出现在相同的hits.hitNumber中,则为该会话计数1(如果主查询为...,则每行为1个会话)
有点像盗梦空间,但也许有助于记住unnest()
的输入是一个数组,而输出是表行...