优化数据库中大表的查询(SQL)

问题描述 投票:0回答:1

我正试图优化一个大型事件表(1000万+行)的sql查询,用于日期范围搜索。我在这个表上已经有了唯一的索引(lid, did, measurement, date).下面的查询是试图在日期列中得到每2秒间隔的三种类型的测量(千瓦,电流和电压)的事件。

SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events 
WHERE lid = 1 
  and did = 1
  and measurement IN ("Voltage") 
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events
WHERE lid = 1
  and did = 1
  and measurement IN ("Current") 
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events
WHERE lid = 1
  and did = 1
  and measurement IN ("Kilowatts") 
group by timekey

这是我试图查询到的表。

=============================================================
id  |  lid   |   did   |   measurement  |  date 
=============================================================
1   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:00
=============================================================
2   |  1     |   1     |   Current      | 2020-04-27 00:00:00
=============================================================
3   |  1     |   1     |   Voltage      | 2020-04-27 00:00:00
=============================================================
4   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:01
=============================================================
5   |  1     |   1     |   Current      | 2020-04-27 00:00:01
=============================================================
6   |  1     |   1     |   Voltage      | 2020-04-27 00:00:01
=============================================================
7   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:02
=============================================================
8   |  1     |   1     |   Current      | 2020-04-27 00:00:02
=============================================================
9   |  1     |   1     |   Voltage      | 2020-04-27 00:00:02

预期的结果是检索所有日期等于2020-04-27 00:00:00和2020-04-27 00:00:02的数据。上面提供的查询可以正常工作。但我是用UNION来查找表上的不同测量值,我相信这可能不是最佳的方法。

有哪位SQL专家能帮我调一下我的查询,以提高性能?

mysql sql database query-performance database-optimization
1个回答
1
投票

你有一个记录每秒钟的每一个测量,你想选择一个记录每两秒钟。

你可以试试。

select *
from events
where 
    lid = 1 
    and did = 1 
    and measurement IN ('Voltage', 'Current')
    and extract(second from date) % 2 = 0

这将选择有偶数秒的记录。

或者,如果你总是每秒钟有一条记录,另一个选项是 row_number() 这需要MySQL 8.0)。

select *
from (
    select 
        e.*, 
        row_number() over(partition by measurement order by date) rn
    from events
    where 
        lid = 1 
        and did = 1 
        and measurement IN ('Voltage', 'Current')
) t
where rn % 2 = 1

但这比之前的查询要不准确。


0
投票

你的查询实际上是三个查询合并成一个。幸运的是,它们都是根据类似的列来选择数据行。如果你想让这个查询快速运行,可以添加以下索引。

create index ix1 on events (lid, did, measurement);

0
投票

除了上面的建议外,还可以改变 PRIMARY KEY 会给你更多的性能。

PRIMARY KEY(lid, did, date, measurement)

和折腾 id.

需要注意的是,如果两个读数在完全相同的 "秒 "内出现,可能会出现打嗝现象。 如果一个读数在时钟滴答声之后,而下一个读数在下一个滴答声之前,这很容易发生。

© www.soinside.com 2019 - 2024. All rights reserved.