Oracle SQL:基于子字符串和上一行(滞后)行统计连续的站点访问次数

问题描述 投票:0回答:1

[使用Oracle SQL,我正在尝试计算对网站的唯一身份访问总次数。我用来编写查询的表格没有时间戳,其中仅包括DDMMYY的分钟和秒,并且表格中的每一行都代表客户点击页面。表格每小时指定一个新的“会话”,无论这是否实际反映出客户POV的新访问。我必须做的是使用非连续会话作为唯一访问的代理。因此,如果两次访问之间有一个小时的休息时间,则先前的连续分组就是一次访问。我将访问定义为客户ID +会话日期+会话时间的唯一组合。如果客户+天组合内有连续的会话时间,则将其计为一次会话。 HOUR字段包含将日期与小时连接在一起的字符串值。为了进行适当的访问计数计算,我将需要解析小时数并从上一行(滞后)中减去,以确定是否有一个小时以上的“休息时间”。

Example of Raw Data:
TRANS_TO_DATE   CUSTOMER_ID HOUR
10/21/17        1007589445  October 21, 2017, Hour 1
10/21/17        1007589445  October 21, 2017, Hour 2
10/21/17        1007589445  October 21, 2017, Hour 2
10/21/17        1007589445  October 21, 2017, Hour 2
10/21/17        1007589445  October 21, 2017, Hour 3
10/21/17        1007589445  October 21, 2017, Hour 5
10/21/17        1007589445  October 21, 2017, Hour 6
10/21/17        1007589445  October 21, 2017, Hour 23
10/21/17        1007589445  October 21, 2017, Hour 23
10/21/17        1007589445  October 21, 2017, Hour 23
11/1/17         1007589445  November 1, 2017, Hour 10
1/1/18          1007589445  January  1, 2018, Hour 10
1/1/18          1007589445  January  1, 2018, Hour 10
1/1/18          1007589445  January  1, 2018, Hour 11
1/1/18          1007589445  January  1, 2018, Hour 14
1/1/18          1007589445  January  1, 2018, Hour 20
1/1/18          1007589445  January  1, 2018, Hour 22

访问次数实际上是这个:

Customer_id Day Hour    Visit Grouping 
1007589445  October 21, 2017    1   Visit 1
1007589445  October 21, 2017    2   Visit 1
1007589445  October 21, 2017    3   Visit 1
1007589445  October 21, 2017    5   Visit 2
1007589445  October 21, 2017    6   Visit 2
1007589445  October 21, 2017    23  Visit 3
1007589445  November 1, 2017    10  Visit 1
1007589445  January 1, 2018 10  Visit 1
1007589445  January 1, 2018 11  Visit 1
1007589445  January 1, 2018 14  Visit 2
1007589445  January 1, 2018 20  Visit 3
1007589445  January 1, 2018 21  Visit 4

客户1007589445拥有

2017年10月21日3次访问-2017年11月1日1次造访-2018年1月1日4次访问

总访问次数:8

下面是我到目前为止拥有的sql代码,需要修改才能满足上面的critera。

select 
CUSTOMER_ID, 
TRANS_TO_DATE,
HOUR,
count (HOUR) as visits
from mstr_clickstream_vw 
where trans_to_date between start_date and end_date
and web_store_ind='US'
 group by CUSTOMER_ID, TRANS_TO_DATE,HOUR
sql oracle substring lag clickstream
1个回答
0
投票
cast(trim(substr(hour, -2) as int)

然后使用它通过使用lag()和累积条件聚集来分配会话:

select cs.*,
       sum(case when trans_to_date = prev_ttd and prev_hh = hh then 0
                when trans_to_date = prev_ttd and prev_hh = hh - 1 then 0
                when hh = 0 and prev_hh = 23 and trans_to_date = prev_ttd + interval '1' day then 0
                else 1
           end) over (partition by customer_id order by trans_to_date, hh) as grouping
from (select cs.*,
             lag(trans_to_date) over (partition by customer_id order by trans_to_date, hh) as prev_ttd,
             lag(hh) over (partition by customer_id order by trans_to_date, hh) as prev_hh
      from (select cs.*,
                   cast(trim(substr(hour, -2) as int) as hh
            from mstr_clickstream_vw cs
           ) cs
      ) cs;
© www.soinside.com 2019 - 2024. All rights reserved.