结合 SELECT DISTINCT、ORDER BY 和小 LIMIT 的有效方法

问题描述 投票:0回答:1

我的桌子是:

create table transactions
    transaction_id  integer  not null,
    transaction_timestamp integer  not null,
    input_index     smallint,
    output_index    smallint not null,
    from_id         integer,
    to_id           integer  not null,
    input_value     real,
    output_value    real     not null,
    constraint unique_transactions
        unique (transaction_id, from_id, to_id)
);

具有以下索引:

create index idx_transactions_from_id_block_timestamp
    on transactions (from_id asc, transaction_timestamp desc);

create index idx_transactions_to_id_block_timestamp
    on transactions (to_id asc, transaction_timestamp desc);

create index idx_transactions_transaction_id
    on transactions (transaction_id);

create index idx_transactions_block_timestamp
    on transactions (transaction_timestamp desc);

我理想中想要的查询是-

select distinct on (transaction_id,output_index) *
from transactions
where to_id = 1000
and transaction_timestamp between 1691193600 AND 1711929600
order by transaction_timestamp desc
limit 10

给我最近 10 个唯一的 (transaction_id,output_index) 对(不关心选择保留哪一个 from_id 和 input_index)。

这种直接的方法行不通,因为 postgres 要求 order by 首先包含列上的不同值。 错误:SELECT DISTINCT ON 表达式必须与初始 ORDER BY 表达式匹配

这样做会重新排序我的行,选择 transaction_id 最高的前 10 行,这是我不想要的。

有没有一种有效的方法来做到这一点,使用下限数量希望不必超过表中的数百万行?

我尝试了以下查询,但最终都花费了太长时间,因为它们需要处理整个表,而不使用小限制 10。

查询1:

WITH RankedTransactions AS (
        SELECT *,
               ROW_NUMBER() OVER (PARTITION BY transaction_id, output_index ORDER BY transaction_timestamp DESC) AS rn
        FROM transactions
        WHERE to_id = 1000
          and transaction_timestamp between 1691193600 AND 1711929600
    )
    SELECT transaction_id,
           input_index,
           output_index,
           transaction_timestamp,
           from_id,
           to_id,
           input_value,
           output_value
    FROM RankedTransactions
    WHERE rn = 1
    ORDER BY transaction_timestamp DESC
    LIMIT 10;

查询2:

SELECT *
FROM (
    SELECT DISTINCT ON (transaction_id, output_index) *
    FROM transactions
    WHERE to_id = 1000
    and transaction_timestamp between 1691193600 AND 1711929600
    ORDER BY transaction_id, output_index DESC
) AS latest_transactions
ORDER BY transaction_timestamp DESC
LIMIT 10;
sql postgresql greatest-n-per-group
1个回答
0
投票

这有效:

SELECT *
FROM  (
    SELECT DISTINCT ON (transaction_id, output_index) *
    FROM   transactions
    WHERE  to_id = 1000
    AND    transaction_timestamp BETWEEN 1691193600 AND 1711929600
    ORDER  BY transaction_id, output_index DESC, transaction_timestamp DESC  -- !!!
    ) AS latest_transactions
ORDER  BY transaction_timestamp DESC
LIMIT  10;

最佳查询(和索引)取决于每个基本选择期望有多少行(以及其中的重复行)(通过

WHERE to_id = 1000 AND transaction_timestamp BETWEEN 1691193600 AND 1711929600
过滤后)。

(to_id, transaction_timestamp DESC)
上的索引支持(您似乎有?)这个查询可能就是这样。

(transaction_id, output_index)
上有大量符合条件的行和/或大量重复行,事情会变得更加复杂。特别是因为您(1)在基本过滤器中已经有范围条件并且(2)想要
(transaction_id ASC, output_index DESC)
的混合排序顺序,这使得模拟索引跳过扫描变得困难......

相关:

© www.soinside.com 2019 - 2024. All rights reserved.