使用sql计算增长和保留率

问题描述 投票:0回答:1

因此,我写了一个查询来计算保留率,新生和回国学生的增长率。下面的代码返回类似于此的结果。

Row  visit_month    student_type    numberofstd  growth 
1      2013          new                574       null
2      2014          new                220       -62%
3      2014        retained             442       245%
4      2015          new                199       -10%
5      2015        retained             533        21%
6      2016          new                214        8%
7      2016        retained             590        11%
8      2016        returning            1         -100%

我尝试过的查询。

with visit_log AS (
    SELECT studentid,
            cast(substr(session, 1, 4) as numeric) as visit_month,
    FROM abaresult
    GROUP BY 1,
             2
    ORDER BY 1,
             2),
  time_lapse_2 AS (
        SELECT studentid,
               Visit_month,
               lag(visit_month, 1) over (partition BY studentid ORDER BY studentid, visit_month) lag
         FROM visit_log),
  time_diff_calculated_2 AS (
        SELECT studentid,
               visit_month,
               lag,
               visit_month - lag AS time_diff
         FROM time_lapse_2),

  student_categorized AS (
        SELECT studentid,
               visit_month,
               CASE
                        WHEN time_diff=1 THEN 'retained'
                        WHEN time_diff>1 THEN 'returning'
                        WHEN time_diff IS NULL THEN 'new'
               END AS student_type,
    FROM time_diff_calculated_2)

SELECT visit_month,
         student_type,
         Count(distinct studentid) as numberofstd,
         ROUND(100 * (COUNT(student_type) - LAG(COUNT(student_type), 1) OVER (ORDER BY student_type)) / LAG(COUNT(student_type), 1) OVER (ORDER BY student_type),0) || '%' AS growth
  FROM student_categorized
group by 1,2
order by 1,2

上面的查询根据上次会话student_type类别的数字计算保留率,新增率和返回率。

我正在寻找一种方法,而不是根据每个类别的访问量/每月访问量来计算这些数字。有没有办法可以做到这一点?

我正在尝试获取与此类似的表

Row  visit_month    student_type  totalstd  numberofstd  growth 
1      2013          new           574         574       null
2      2014          new           662         220       62%
3      2014        retained        662         442       22%
4      2015          new           732         199       10%
5      2015        retained        732         533       21%
6      2016          new           804         214       8%
7      2016        retained        804         590       11%
8      2016        returning       804         1         100%

注意:

totalstd是每节课的学生总数,由new + retention + returning获得。

假设增长计算。

请帮助!谢谢。

google-bigquery analytics retention
1个回答
0
投票

虽然我没有您的源数据,但是我依靠您共享的查询和输出结果。

我创建了一些额外的代码以输出所需的结果。我想指出,我没有访问BigQuery的编译的权限,因为我没有数据。因此,我试图防止自己查询的任何可能的错误。此外,**之间的查询保持不变,并从您的代码中复制。下面是代码(它是您的代码和我创建的额外位的组合):

#*****************************************************************
with visit_log AS (
    SELECT studentid,
            cast(substr(session, 1, 4) as numeric) as visit_month,
    FROM abaresult
    GROUP BY 1,
             2
    ORDER BY 1,
             2),
  time_lapse_2 AS (
        SELECT studentid,
               Visit_month,
               lag(visit_month, 1) over (partition BY studentid ORDER BY studentid, visit_month) lag
         FROM visit_log),
  time_diff_calculated_2 AS (
        SELECT studentid,
               visit_month,
               lag,
               visit_month - lag AS time_diff
         FROM time_lapse_2),

  student_categorized AS (
        SELECT studentid,
               visit_month,
               CASE
                        WHEN time_diff=1 THEN 'retained'
                        WHEN time_diff>1 THEN 'returning'
                        WHEN time_diff IS NULL THEN 'new'
               END AS student_type,
    FROM time_diff_calculated_2)
#**************************************************************

#Code I added
#each unique visit_month will have its count
WITH total_stud AS (
SELECT visit_month, count(distinct studentid) as totalstd FROM visit_log 
GROUP BY 1
ORDER BY visit_month
),

#After you have your student_categorized temp table, create another one
#It will have the count of the number of students per visit_month per student_type
number_std_monthType AS (
SELECT visit_month,student_type, Count(distinct studentid) as numberofstd from student_categorized
GROUP BY 1, 2
),

#You will have one row per combination of visit_month and student_type
student_categorized2 AS(
SELECT DISTINCT visit_month,student_type FROM student_categorized2 
GROUP BY 1,2
),

#before calculation, create the table with all the necessary data
#you have the desired table without the growth
#notice that I used two keys to join t1 and t3 so the results are correct
final_t AS (
SELECT t1.visit_month, 
       t1.student_type, 
       t2.totalstd as totalstd, 
       t3.numberofstd 
FROM student_categorized2 t1 
       LEFT JOIN total_stud AS t2 ON t1.visit_month = t2.visit_month
       LEFT JOIN number_std_monthType t3 ON (t1.visit_month = t3.visit_month and t1.student_type = t3.student_type)
ORDER BY
)

#Now all the necessary values to calculate growth are in the temp table final_t
SELECT visit_month, student_type, totalstd, numberofstd,
       ROUND(100 * (totalstd - LAG(totalstd) OVER (PARTITION BY visit_month ORDER BY visit_month ASC) /LAG(totalstd) OVER (PARTITION BY visit_month ORDER BY visit_month ASC) || '%' AS growth
FROM final_t  

注意,一旦在不同的临时表]中计算了每个计数,我便会在最终表中使用LEFT JOIN进行计数。另外,我没有使用您的最终SELECT声明。

如果您对代码有任何疑问,请随时提出。

© www.soinside.com 2019 - 2024. All rights reserved.