我有一张订单表(每个订单(transaction_id)都有client_id和item_id) 我需要在订单中找到一组商品及其数量完全相同的订单
sql_1 = """
CREATE TABLE IF NOT EXISTS orders (
client_id varchar(10),
item_id varchar(10),
transaction_id varchar(10)
)
"""
con.execute(sql_1)
sql_2 = """
INSERT INTO orders values
('CL1111','111','1001'),
('CL1111','222','1001'),
('CL1111','333','1001'),
('CL2222','111','1002'),
('CL2222','222','1002'),
('CL2222','333','1002'),
('CL3333','111','1003'),
('CL3333','222','1003'),
('CL3333','333','1003'),
('CL3333','444','1003'),
('CL4444','111','1004'),
('CL4444','222','1004'),
('CL4444','333','1004'),
('CL5555','111','1005'),
('CL5555','222','1005'),
('CL6666','111','1006'),
('CL6666','222','1006'),
('CL6666','333','1007')
"""
con.execute(sql_2)
我这样做了
sql = """
with
a as
(
select
client_id, item_id, transaction_id,
row_number() over (partition by transaction_id order by item_id) as rn
from orders
)
select
a.*
, b.transaction_id as b_tr_id
from
a left join a as b
on a.rn = b.rn
and a.item_id = b.item_id
and a.transaction_id != b.transaction_id
"""
df_SQL = pd.read_sql(sql, con)
df_SQL
但这并没有解决我的问题。 现在我不知道如何下订单,它在 item_id、数量和项目集上完全相同。
您的示例代码不包括数量! 这是一个代码,它的结果与表中包含相同订单的行一样多:
sql = """
with orders_mod as
(
select distinct
"transaction_id",
string_agg("item_id" || ' ' || "quantity"::text, ',') over (partition by "client_id") as "list_of_ordered_item_ids"
from orders
)
select distinct
string_agg("transaction_id", ',') over (partition by "list_of_ordered_item_ids") as "list_of_the_same_orders",
"list_of_ordered_item_ids"
from orders_mod
;
"""
df_SQL = pd.read_sql(sql, con)
df_SQL
请检查此查询,它在 dbfiddle
中返回正确的行with agg as (
SELECT client_id,
array_agg(array[item_id]) ai, array_agg(array[quantity]) aq
from orders group by client_id
having count(distinct transaction_id) = 1 )
select client_id, ai items, aq quantities
from (
select client_id, ai , aq , count(1) over (partition by ai, aq) cnt
from agg ) c
where cnt > 1
client_id | 项目 | 数量 |
---|---|---|
CL1111 | {{111},{222},{333}} | {{1},{2},{1}} |
CL4444 | {{111},{222},{333}} | {{1},{2},{1}} |
我不确定有多个交易的客户怎么样,但是在您的示例中,客户 CL5555 和 CL6666 不匹配,即使它们具有相同的交易 1005 和 1006。所以这里只比较具有一个不同交易的客户。
试试这个查询。聚合中的顺序子句对于可靠的结果是必要的。
with transactions as(
select client_id,transaction_id
,string_agg( (item_id ||'('||cast(quantity as varchar) || ')') ,',' order by item_id) itemlist
from orders
group by client_id,transaction_id
)
select *
,row_number()over(partition by itemlist order by client_id,transaction_id) rn
from transactions
order by itemlist
使用测试数据,找到 transaction_id 1001-1004 和 1005-1006 的 2 个相同的组(不是你图片中的绿色)
结果是
client_id | 交易编号 | 项目列表 | rn |
---|---|---|---|
CL5555 | 1005 | 111(1),222(2) | 1 |
CL6666 | 1006 | 111(1),222(2) | 2 |
CL1111 | 1001 | 111(1),222(2),333(1) | 1 |
CL4444 | 1004 | 111(1),222(2),333(1) | 2 |
CL3333 | 1003 | 111(1),222(2),333(1),444(1) | 1 |
CL2222 | 1002 | 111(2),222(1),333(1) | 1 |
CL6666 | 1007 | 333(1) | 1 |
测试数据
INSERT INTO orders (client_id,item_id,transaction_id,quantity)
values
('CL1111','111','1001',1),
('CL1111','222','1001',2),
('CL1111','333','1001',1),
('CL2222','111','1002',2),
('CL2222','222','1002',1),
('CL2222','333','1002',1),
('CL3333','111','1003',1),
('CL3333','222','1003',2),
('CL3333','333','1003',1),
('CL3333','444','1003',1),
('CL4444','111','1004',1),
('CL4444','222','1004',2),
('CL4444','333','1004',1),
('CL5555','111','1005',1),
('CL5555','222','1005',2),
('CL6666','111','1006',1),
('CL6666','222','1006',2),
('CL6666','333','1007',1)
;