MySQL 查询性能,使用巨大的表来生成给定日期的报告,其中包含累积计数和计数

问题描述 投票:0回答:1

我正在使用 MySQL,并且我有一个详细表,其中包含本地中的 200 万条记录,并且以下查询大约需要 2 分钟才能生成这 200 万条记录,但在生产中,在任何给定的一天,该表中都有 8000 万条记录。

我已经在

StatusDateTime
ProductionFacility
ProductionStatusNo
等列上建立了索引。

即使我在表中基于

ProductionStatusNo
对0、1、2等值进行分区,直到12。

此查询需要两个参数,一个是

p_StatusDateTime
,另一个是
p_UnassignedProductionFaciltiy
。我使用联合,因为有三种不同的场景来计算每个状态的计数和累计计数。状态 0 和状态 1 有特殊之处,我需要在子查询中使用 not 来排除某些记录,但当我计算其他状态时则不需要

SELECT 'Unassigned',0,p_StatusDateTime,
        COUNT(DISTINCT UniqueFormId )   ,    
        COUNT(DISTINCT UniqueFormId )       
        FROM detail
        WHERE ProductionFacility=p_UnassignedProductionFaciltiy and DATE(StatusDateTime)<=p_StatusDateTime
        and UniqueFormId not in ( select DISTINCT UniqueFormId from detail as exd where DATE(exd.StatusDateTime) <= p_StatusDateTime and exd.ProductionStatusNo!=1 )
        
        
      

union 
        SELECT ps.Status,ps.Id,p_StatusDateTime,
        CASE WHEN totalcount is null then 0 else totalcount end as count,
        (CASE WHEN totalcount is null then 0 else totalcount end) +
        COALESCE( (SELECT cumulative_count  FROM dailysummary WHERE ProductionStatusNo=ps.Id and DATE(StatusDateTime)=    DATE_SUB(p_StatusDateTime, INTERVAL 1 DAY) ),0)  as cumulative_count
         FROM productionstatus as ps left join (
        SELECT count(DISTINCT UniqueFormId) as totalcount ,
        DATE(p_StatusDateTime) as StatusDateTime,
        ProductionStatusNo 
        FROM detail as d
        WHERE DATE(StatusDateTime)=p_StatusDateTime 
         and UniqueFormId not in ( select DISTINCT UniqueFormId from detail as exd where DATE(exd.StatusDateTime) < p_StatusDateTime and exd.ProductionStatusNo=d.ProductionStatusNo )
        group by ProductionStatusNo ) as d ON ps.Id= d.ProductionStatusNo
        WHERE ps.Id =1 

UNION

        SELECT     ps.Status,     ps.Id,     p_StatusDateTime as StatusDateTime,     COALESCE(c, 0) as c,      COALESCE(c, 0) as c
        FROM  productionstatus as ps
        LEFT JOIN (    SELECT         COALESCE(COUNT(*), 0) as c,        ed.psn 
        FROM (
        SELECT
            UniqueFormId,MAX(productionStatusNo) as psn  FROM   detail
        WHERE   DATE(statusdatetime) <= p_StatusDateTime
        GROUP BY UniqueFormId
            ) as ed
        GROUP BY ed.psn
        ) as l ON ps.Id = l.psn
        WHERE
            ps.Id not in ( 0,1); 

这是表架构如下:

CREATE TABLE `detail` (
  `Id` int NOT NULL AUTO_INCREMENT,
  
  `EINNo` varchar(45) NOT NULL,
  `EmployeeNo` varchar(45) NOT NULL,
  `Form` varchar(45) NOT NULL,
  `ProductionStatus` varchar(45) NOT NULL,
  `ProductionStatusNo` int NOT NULL,
  
  `StatusDateTime` varchar(80) DEFAULT NULL,
  
  `UniqueFormId` varchar(450) NOT NULL,
  `ProductionFacility` varchar(450) NOT NULL,
  
  `IMB` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`Id`,`ProductionStatusNo`),
  KEY `idx_detail_EINNo` (`EINNo`),
  KEY `idx_detail_EmployeeNo` (`EmployeeNo`),
  KEY `idx_detail_Form` (`Form`),
  KEY `idx_ProductionStatus` (`ProductionStatus`),
  KEY `IX_detail_Covering` (`ProductionStatus`,`StatusDateTime`,`Id`),
  KEY `idx_detail_UniqueFormId` (`UniqueFormId`),
  KEY `idx_detail_UniqueFormId1` (`UniqueFormId`),
  KEY `idx_detail_ProductionFacility` (`ProductionFacility`),
  KEY `idx_detail_StatusDateTime` (`StatusDateTime`),
  KEY `idx_detail_ProductionStatusNo` (`ProductionStatusNo`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (`ProductionStatusNo`)
(PARTITION p0 VALUES LESS THAN (1) ENGINE = InnoDB,
 PARTITION p1 VALUES LESS THAN (2) ENGINE = InnoDB,
 PARTITION p2 VALUES LESS THAN (3) ENGINE = InnoDB,
 PARTITION p3 VALUES LESS THAN (4) ENGINE = InnoDB,
 PARTITION p4 VALUES LESS THAN (5) ENGINE = InnoDB,
 PARTITION p5 VALUES LESS THAN (6) ENGINE = InnoDB,
 PARTITION p6 VALUES LESS THAN (7) ENGINE = InnoDB,
 PARTITION p7 VALUES LESS THAN (8) ENGINE = InnoDB,
 PARTITION p8 VALUES LESS THAN (9) ENGINE = InnoDB,
 PARTITION p9 VALUES LESS THAN (10) ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN (11) ENGINE = InnoDB,
 PARTITION p11 VALUES LESS THAN (12) ENGINE = InnoDB,
 PARTITION p12 VALUES LESS THAN (13) ENGINE = InnoDB,
 PARTITION p13 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;

这是运行此查询的存储过程的代码

CREATE DEFINER=`root`@`localhost` PROCEDURE `Dashboard_Count`(
IN p_StatusDateTime Date, 
IN p_UnassignedProductionFaciltiy nvarchar(500)

)
BEGIN
    
         
    
        INSERT INTO dailysummary(ProductionStatus ,ProductionStatusNo,StatusDateTime ,count , cumulative_count ) 
        
         SELECT 'Unassigned',0,p_StatusDateTime,
        COUNT(DISTINCT UniqueFormId )   ,    
        COUNT(DISTINCT UniqueFormId )       
        FROM detail
        WHERE ProductionFacility=p_UnassignedProductionFaciltiy and DATE(StatusDateTime)<=p_StatusDateTime
        and UniqueFormId not in ( select DISTINCT UniqueFormId from detail as exd where DATE(exd.StatusDateTime) <= p_StatusDateTime and exd.ProductionStatusNo!=1 )
        
        
      

union 
SELECT ps.Status,ps.Id,p_StatusDateTime,
        CASE WHEN totalcount is null then 0 else totalcount end as count,
        (CASE WHEN totalcount is null then 0 else totalcount end) +
        COALESCE( (SELECT cumulative_count  FROM dailysummary WHERE ProductionStatusNo=ps.Id and DATE(StatusDateTime)=    DATE_SUB(p_StatusDateTime, INTERVAL 1 DAY) ),0)  as cumulative_count
         FROM productionstatus as ps left join (
        SELECT count(DISTINCT UniqueFormId) as totalcount ,
        DATE(p_StatusDateTime) as StatusDateTime,
        ProductionStatusNo 
        FROM detail as d
        WHERE DATE(StatusDateTime)=p_StatusDateTime 
         and UniqueFormId not in ( select DISTINCT UniqueFormId from detail as exd where DATE(exd.StatusDateTime) < p_StatusDateTime and exd.ProductionStatusNo=d.ProductionStatusNo )
        group by ProductionStatusNo ) as d ON ps.Id= d.ProductionStatusNo
        WHERE ps.Id =1 

        UNION

        SELECT     ps.Status,     ps.Id,     p_StatusDateTime as StatusDateTime,     COALESCE(c, 0) as c,      COALESCE(c, 0) as c
        FROM  productionstatus as ps
        LEFT JOIN (    SELECT         COALESCE(COUNT(*), 0) as c,        ed.psn 
        FROM (
        SELECT
            UniqueFormId,MAX(productionStatusNo) as psn  FROM   detail
        WHERE   DATE(statusdatetime) <= p_StatusDateTime
        GROUP BY UniqueFormId
            ) as ed
        GROUP BY ed.psn
        ) as l ON ps.Id = l.psn
        WHERE
            ps.Id not in ( 0,1);
END
mysql query-optimization
1个回答
0
投票
  • 不要费力地浏览事实表,而是构建和维护包含每日小计的汇总表

您需要包含当天的计数吗? (有两种方法可以解决这个问题。)

    除非您打算删除“旧”数据,否则
  • 分区可能毫无用处。那么它只会帮助大删除(通过使用

    DROP PARTITION
    而不是大
    DELETE
    )。

  • 冗余:

    KEY `idx_ProductionStatus` (`ProductionStatus`),  // DROP this
    KEY `IX_detail_Covering`   (`ProductionStatus`,`StatusDateTime`,`Id`),
    
  • 为什么选择两次:

    COUNT(DISTINCT UniqueFormId )
    
  • 请用其来源的表格来限定每一列。我无法解释这一点:

    WHERE  DATE(statusdatetime) <= p_StatusDateTime
    

    此外,

    DATE(col)
    不是 sargable 。也许我们可以解决它。

    而且,如果没有别名,我无法分析索引。

  • 您正在使用

    LEFT JOIN
    ,但未检查
    NULL
    。也许简单的
    JOIN
    更合适?

  • 我认为在这种情况下不需要

    COALESCE

    COALESCE(COUNT(*), 0) as c
    
  • NOT IN ( SELECT ... )
    通常效率很低。尝试一下

    NOT EXISTS ( SELECT 1 ... )
    

    LEFT JOIN ... WHERE ... IS NULL
    
© www.soinside.com 2019 - 2024. All rights reserved.