如何提高查询时间序列的性能?

问题描述 投票:0回答:1

我有以下查询来按 8 小时分段显示机器的性能(尽管它可以更改为 1 小时、1 周、1 个月等)。我的表有200万条记录,查询执行需要10秒。这是一个合理的速度,还是可以改进?

我有

  • 8 GB 内存
  • Intel Haswell CPU 4 核
  • PostgreSQL 14.2
  • TimescaleDB 2.6.1
  • 共享缓冲区= 1024MB
  • temp_buffers = 16MB
  • work_mem = 64MB

tbl_pieza
表是一个超级表。

create table tbl_pieza
(
    id_nu_pieza             integer                             not null,
    id_nu_orden_fabricacion integer,
    id_nu_referencia        integer,
    id_nu_operacion         integer,
    id_nu_maquina           integer,
    id_nu_usuario           integer,
    ind_paro                integer,
    ind_validada            integer   default 0,
    nu_segundos             integer,
    dtm_inicio_at           timestamp default CURRENT_TIMESTAMP not null,
    dtm_fin_at              timestamp,
    ind_estatus             integer   default 1,
    dtm_create_at           timestamp,
    dtm_update_at           timestamp default CURRENT_TIMESTAMP,
    ind_retrabajo           integer   default 0,
    primary key (id_nu_pieza, dtm_inicio_at)
);

create index tbl_pieza_dtm_inicio_at_idx
    on tbl_pieza (dtm_inicio_at desc);

create index idx_time_range
    on tbl_pieza (dtm_inicio_at, dtm_fin_at);

WITH Rangos AS (
    SELECT
        generate_series(
            '2023-05-22 16:23:14'::timestamp,
            '2023-05-26 08:23:14'::timestamp,
            '8 hour'::interval
        ) AS inicio,
        generate_series(
            '2023-05-23 00:23:14'::timestamp,
            '2023-05-26 16:23:14'::timestamp,
            '8 hour'::interval
        ) AS fin
),
PiezasPorIntervalo AS (
    SELECT
        r.inicio,
        r.fin,
        p.id_nu_operacion,
        p.id_nu_maquina,
        SUM(
            CASE
                WHEN EXTRACT(epoch FROM p.dtm_fin_at - p.dtm_inicio_at) = 0 THEN 0
                ELSE GREATEST(0, EXTRACT(epoch FROM LEAST(r.fin, p.dtm_fin_at) - GREATEST(r.inicio, p.dtm_inicio_at)) / EXTRACT(epoch FROM p.dtm_fin_at - p.dtm_inicio_at))
            END
        ) as PiezasReales
    FROM Rangos r
    JOIN tbl_pieza p ON p.dtm_inicio_at < r.fin AND p.dtm_fin_at > r.inicio
                            AND p.id_nu_usuario in (1,8,11,43,44,45,46,47,48,49)
                            AND p.id_nu_operacion in (84,85,86,87,88,89,90,91,92,93,118,119)
                            AND p.id_nu_referencia in (46,58,59,60)
                            AND p.id_nu_maquina in (1,2,3,8)
    GROUP BY r.inicio, r.fin, p.id_nu_operacion, p.id_nu_maquina
)
SELECT
    p.inicio as fecha_inicio,
    p.fin as fecha_fin,
    p.id_nu_maquina as id_maquina,
    CASE
        WHEN o.ciclo_estimado + o.tiempo_cambio_estimado = 0 THEN 0
        ELSE (p.PiezasReales::decimal / (28800 / (o.ciclo_estimado + o.tiempo_cambio_estimado))) * 100
    END as resultado
FROM PiezasPorIntervalo p
JOIN operacion o ON o.id_operacion = p.id_nu_operacion
ORDER BY fecha_inicio;

我正在具有上述规格的系统上运行它。任何有关如何优化此查询以获得更好性能的建议将不胜感激。谢谢!

EXPLAIN(分析、缓冲区)的输出

解释输出

PiezasPorIntervalo 是耗时最长的部分,我会立即解释我所做的

假设我们有一个包含以下条目的生产表:

Piece | Production Start         | Production End
----- | ------------------------| -------------------------
A     | 2023-05-23 08:00:00     | 2023-05-23 10:00:00
B     | 2023-05-23 09:30:00     | 2023-05-23 12:00:00
C     | 2023-05-23 10:30:00     | 2023-05-23 11:30:00
D     | 2023-05-23 12:00:00     | 2023-05-23 13:30:00

假设我们要计算从“2023-05-23 09:00:00”到“2023-05-23 11:00:00”的特定时间间隔内“PiezasReales”的数量。这是逐步计算:

时间范围与单件生产时间交叉的持续时间:

For Piece A: MIN(2023-05-23 11:00:00, 2023-05-23 10:00:00) - MAX(2023-05-23 09:00:00, 2023-05-23 08:00:00) = 1 hour
For Piece B: MIN(2023-05-23 11:00:00, 2023-05-23 12:00:00) - MAX(2023-05-23 09:00:00, 2023-05-23 09:30:00) = 0.5 hours
For Piece C: MIN(2023-05-23 11:00:00, 2023-05-23 11:30:00) - MAX(2023-05-23 09:00:00, 2023-05-23 10:30:00) = 0.5 hours
For Piece D: No intersection with the interval, so the duration is 0.

该作品的总时长:

For Piece A: 2023-05-23 10:00:00 - 2023-05-23 08:00:00 = 2 hours
For Piece B: 2023-05-23 12:00:00 - 2023-05-23 09:30:00 = 2.5 hours
For Piece C: 2023-05-23 11:30:00 - 2023-05-23 10:30:00 = 1 hour
For Piece D: 2023-05-23 13:30:00 - 2023-05-23 12:00:00 = 1.5 hours

间隔中的时间分数:

For Piece A: 1 hour / 2 hours = 0.5
For Piece B: 0.5 hours / 2.5 hours = 0.2
For Piece C: 0.5 hours / 1 hour = 0.5
For Piece D: 0 (since there's no intersection with the interval).

区间内的压电实数数量:

间隔内时间分数的总和:0.5 + 0.2 + 0.5 + 0 = 1.2 因此,在这个特定的时间间隔内,考虑到每件作品在该时间间隔内生产的时间比例,“RealPieces”相当于 1.2 件。

sql postgresql query-optimization timescaledb
1个回答
0
投票

首先要做的是创建一个表时间表,以唯一列作为 PK 来替换查询中的generate series语句:

CREATE TABLE TIME_SERIES AS
SELECT *
FROM generate_series(
            '2023-05-22 16:23:14'::timestamp,
            '2023-05-26 08:23:14'::timestamp,
            '8 hour'::interval
        ) AS d UNION 
SELECT * 
FROM generate_series(
            '2023-05-23 00:23:14'::timestamp,
            '2023-05-26 16:23:14'::timestamp,
            '8 hour'::interval
        ) AS f;
© www.soinside.com 2019 - 2024. All rights reserved.