如何根据百分位数过滤sql中的数据

Question

我有2个表，第一个表包含客户信息，如id，年龄和名称。第二个表包含他们的ID，他们购买的产品信息以及purchase_date（日期是2016年到2018年）

Table 1
-------
customer_id
customer_age
customer_name

Table2
------
customer_id
product
purchase_date

我想要的结果是生成包含在2017年购买的customer_name和产品的表以及在2016年购买的75％以上的客户。

Answer 1

根据您的SQL风格，您可以使用更通用的ntile分析函数获得四分位数。这基本上会为您的查询添加一个新列。

SELECT MIN(customer_age) as min_age FROM (
SELECT customer_id, customer_age, ntile(4) OVER(ORDER BY customer_age) AS q4 FROM table1 
WHERE customer_id IN (
SELECT customer_id FROM table2 WHERE purchase_date = 2016)
) q 
WHERE q4=4

这将返回第四四分位客户的最低年龄，可以在子查询中用于2017年购买的客户。

ntile的论点是你要分成多少桶。在这种情况下，75％+等于第四个四分位数，所以4个桶是可以的。 OVER()子句指定了您要排序的内容（在我们的示例中为customer_age），如果我们想要为不同年份或国家/地区创建多个排名，还允许我们对数据进行分区（分组）。

Answer 2

年龄是一个包含在数据库中的可怕领域。它每天都在变化。你应该有出生日期或类似的东西。

要获得2016年75％的最高价值，有几种可能性。我经常去row_number()和count(*)：

select min(customer_age)
from (select c.*,
             row_number() over (order by customer_age) as seqnum,
             count(*) over () as cnt
      from customers c join
      where exists (select 1
                    from customer_products cp
                    where cp.customer_id = c.customer_id and
                          cp.purchase_date >= '2016-01-01' and
                          cp.purchase_date < '2017-01-01'
                   )
      )
where seqnum >= 0.75 * cnt;

然后，将其用于2017年的查询：

with a2016 as (
      select min(customer_age) as customer_age
      from (select c.*,
                   row_number() over (order by customer_age) as seqnum,
                   count(*) over () as cnt
            from customers c
            where exists (select 1
                          from customer_products cp
                          where cp.customer_id = c.customer_id and
                                cp.purchase_date >= '2016-01-01' and
                                cp.purchase_date < '2017-01-01'
                         )
            ) c
      where seqnum >= 0.75 * cnt
     )
select c.*, cp.product_id
from customers c join
     customer_products cp
     on cp.customer_id = c.customer_id and
        cp.purchase_date >= '2017-01-01' and
        cp.purchase_date < '2018-01-01' join
     a2016 a
     on c.customer_age >= a.customer_age;

如何根据百分位数过滤sql中的数据

问题描述投票：-1回答：2

2个回答

最新问题

如何根据百分位数过滤sql中的数据

问题描述 投票：-1回答：2

2个回答

最新问题

问题描述投票：-1回答：2