[使用Oracle SQL,我正在尝试计算对网站的唯一身份访问总次数。我用来编写查询的表格没有时间戳,其中仅包括DDMMYY的分钟和秒,并且表格中的每一行都代表客户点击页面。表格每小时指定一个新的“会话”,无论这是否实际反映出客户POV的新访问。我必须做的是使用非连续会话作为唯一访问的代理。因此,如果两次访问之间有一个小时的休息时间,则先前的连续分组就是一次访问。我将访问定义为客户ID +会话日期+会话时间的唯一组合。如果客户+天组合内有连续的会话时间,则将其计为一次会话。 HOUR字段包含将日期与小时连接在一起的字符串值。为了进行适当的访问计数计算,我将需要解析小时数并从上一行(滞后)中减去,以确定是否有一个小时以上的“休息时间”。
Example of Raw Data:
TRANS_TO_DATE CUSTOMER_ID HOUR
10/21/17 1007589445 October 21, 2017, Hour 1
10/21/17 1007589445 October 21, 2017, Hour 2
10/21/17 1007589445 October 21, 2017, Hour 2
10/21/17 1007589445 October 21, 2017, Hour 2
10/21/17 1007589445 October 21, 2017, Hour 3
10/21/17 1007589445 October 21, 2017, Hour 5
10/21/17 1007589445 October 21, 2017, Hour 6
10/21/17 1007589445 October 21, 2017, Hour 23
10/21/17 1007589445 October 21, 2017, Hour 23
10/21/17 1007589445 October 21, 2017, Hour 23
11/1/17 1007589445 November 1, 2017, Hour 10
1/1/18 1007589445 January 1, 2018, Hour 10
1/1/18 1007589445 January 1, 2018, Hour 10
1/1/18 1007589445 January 1, 2018, Hour 11
1/1/18 1007589445 January 1, 2018, Hour 14
1/1/18 1007589445 January 1, 2018, Hour 20
1/1/18 1007589445 January 1, 2018, Hour 22
访问次数实际上是这个:
Customer_id Day Hour Visit Grouping
1007589445 October 21, 2017 1 Visit 1
1007589445 October 21, 2017 2 Visit 1
1007589445 October 21, 2017 3 Visit 1
1007589445 October 21, 2017 5 Visit 2
1007589445 October 21, 2017 6 Visit 2
1007589445 October 21, 2017 23 Visit 3
1007589445 November 1, 2017 10 Visit 1
1007589445 January 1, 2018 10 Visit 1
1007589445 January 1, 2018 11 Visit 1
1007589445 January 1, 2018 14 Visit 2
1007589445 January 1, 2018 20 Visit 3
1007589445 January 1, 2018 21 Visit 4
客户1007589445拥有
2017年10月21日3次访问-2017年11月1日1次造访-2018年1月1日4次访问
总访问次数:8
下面是我到目前为止拥有的sql代码,需要修改才能满足上面的critera。
select
CUSTOMER_ID,
TRANS_TO_DATE,
HOUR,
count (HOUR) as visits
from mstr_clickstream_vw
where trans_to_date between start_date and end_date
and web_store_ind='US'
group by CUSTOMER_ID, TRANS_TO_DATE,HOUR
cast(trim(substr(hour, -2) as int)
然后使用它通过使用lag()
和累积条件聚集来分配会话:
select cs.*,
sum(case when trans_to_date = prev_ttd and prev_hh = hh then 0
when trans_to_date = prev_ttd and prev_hh = hh - 1 then 0
when hh = 0 and prev_hh = 23 and trans_to_date = prev_ttd + interval '1' day then 0
else 1
end) over (partition by customer_id order by trans_to_date, hh) as grouping
from (select cs.*,
lag(trans_to_date) over (partition by customer_id order by trans_to_date, hh) as prev_ttd,
lag(hh) over (partition by customer_id order by trans_to_date, hh) as prev_hh
from (select cs.*,
cast(trim(substr(hour, -2) as int) as hh
from mstr_clickstream_vw cs
) cs
) cs;