SQL:如何获取给定行中值的相对位置

问题描述 投票:0回答:4

我正在使用一个名为

orders
的 Postgres 表,它看起来像这样:

user_id   product       order_date
1         pants         7/1/2022
2         shirt         6/1/2022
1         socks         3/17/2023
3         pants         2/17/2023
4         shirt         3/13/2023
2         pants         8/15/2022
1         hat           4/15/2022
5         hat           3/14/2023
2         socks         12/3/2022
3         shirt         4/15/2023
4         socks         1/15/2023
4         pants         4/19/2023
5         shirt         5/2/2023
5         belt          5/15/2023


这是一个 dB Fiddle 数据:https://www.db-fiddle.com/f/uNGjP7gpKwdPGrJ7XmT7k3/2

我输出一个表格,显示客户订单的顺序

user_id   first_order   second_order    third_order
1         hat           pants           socks
2         shirt         pants           socks
3         pants         shirt           <null>
4         socks         shirt           pants
5         hat           shirt           belt            

所以,比如顾客1先买了帽子,然后买了裤子,最后才买了袜子。

我想在行级别设置某种指示器,告诉我特定客户是否在购买另一种产品之前购买了一种产品。例如,我想指出客户是否在购买裤子之前购买了衬衫。

所需的输出如下所示:

user_id   first_order   second_order    third_order     shirt_before_pants
1         hat           pants           socks           false
2         shirt         pants           socks           true
3         pants         shirt           <null>          false
4         socks         shirt           pants           true
5         hat           shirt           belt            false

有没有办法获取给定值在行级别的相对位置?

感谢您的帮助, -瑞秋

sql postgresql pivot aggregate-functions
4个回答
2
投票

我们可以用

row_number()
枚举每个客户的订单,然后使用条件聚合生成新的列。要检查一种产品是否先于另一种购买,我们可以比较两种产品的最短订购日期:

select user_id,
    max(product) filter(where rn = 1) product_1,
    max(product) filter(where rn = 2) product_2,
    max(product) filter(where rn = 3) product_3,
    ( 
          min(order_date) filter(where product = 'shirt') 
        < min(order_date) filter(where product = 'pants')
    ) shirt_before_pants
from (
    select o.*, row_number() over(partition by user_id order by order_date) rn
    from orders o
) o
group by user_id
        

0
投票

该方法使用了窗函数

ROW_NUMBER
(DENSE_RANK也可以),为user_id聚合的每一行分配一个行号。要确定衬衫是否在裤子之前购买,我们可以比较这些产品生成的row_ids

With cte as (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS rn
  FROM orders
)
select user_id, max(case when rn = 1 then product end) as first_order,
                max(case when rn = 2 then product end) as second_order,
                max(case when rn = 3 then product end) as third_order,
                MAX(case when product = 'shirt' then rn end) 
                < MAX(case when product = 'pants' then rn end) as shirt_before_pants
from cte
GROUP BY user_id;

0
投票

如果 ...

  • ...“之前”应该是“紧接之前”的意思,中间没有其他顺序
  • ...一系列产品而不是每个产品的单独列是可以接受的
  • ...你有一个单独的“用户”表
SELECT o.*
FROM   users u
CROSS  JOIN LATERAL (
   SELECT o.user_id
        , array_agg(o.product) AS products
        , bool_or(o.combo) AS shirt_before_pants
   FROM  (
      SELECT o.user_id, o.product::text
           , o.product = 'pants' AND lag(o.product) OVER (ORDER BY o.order_date) = 'shirt' AS combo
      FROM   orders o
      WHERE  o.user_id = u.user_id
      ORDER  BY o.order_date
      LIMIT  3  -- cutoff
      ) o
   GROUP  BY 1
   ) o
ORDER  BY u.user_id;

小提琴

它的美妙之处:只需在您的请求中为不同数量的订单更改

LIMIT
。并且只在一处更改“裤子”和“衬衫”。

由于子查询中的排序,输出数组中的产品已排序。参见:

如果您在 orders(user_id, order_date) 或更好的

orders(user_id, order_date) INCLUDE (product)
.
 上有索引,则查询对于每个用户有 
many

订单的大表执行良好

如果你没有

users
表(你应该有一个),像这样创建它:

CREATE TABLE users AS
SELECT DISTINCT user_id
FROM   orders
ORDER  BY user_id;  -- optional

或阅读此处以获得更快的方式:


0
投票

array_position
函数在这里可能会有帮助:

WITH 

first_orders AS (
  SELECT "user_id", "product", MIN("order_date") AS "order_date"
  FROM "orders"
  GROUP BY "user_id", "product"),

product_arrays AS (
  SELECT "user_id", 
    array_agg(product ORDER BY order_date) AS "products"
  FROM first_orders
  GROUP BY "user_id")
  
SELECT * 
FROM product_arrays
WHERE array_position(products, 'shirt') 
         < array_position(products, 'pants')

或者以下方法同样有效:

WITH 

product_arrays AS (
  SELECT "user_id", 
    array_agg(product ORDER BY order_date) AS "products"
  FROM orders
  GROUP BY "user_id")
  
SELECT * 
FROM product_arrays
WHERE array_position(products, 'shirt') 
         < array_position(products, 'pants')
© www.soinside.com 2019 - 2024. All rights reserved.