具有相同 INDEX 的两个相似的 SPATIAL 列的工作方式不同

问题描述 投票:0回答:1

我正在简化 MYSQL 数据库(v 8.0)中的邮政编码多边形,我正在减少每个多边形的坐标数量。

因此,我有一个名为

zip_city
的表,其中包含名为
boundary
的列,它是原始的多多边形列,并且我使用简化的多边形
boundary_simplified
创建了另一个表。它们都有 SRID 4326(我已包含位置列,因为它可能很重要):

+---------------------+--------------------------------+------+-----+---------+----------------+
| Field               | Type                           | Null | Key | Default | Extra          |
+---------------------+--------------------------------+------+-----+---------+----------------+

| boundary            | multipolygon                   | NO   | MUL | NULL    |                |
| is_point            | tinyint unsigned               | NO   | MUL | 0       |                |
| boundary_simplified | multipolygon                   | NO   | MUL | NULL    |                |
+---------------------+--------------------------------+------+-----+---------+----------------+

运行 SHOW INDEXES,我有这个:

mysql> SHOW INDEXES FROM zip_city;
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table    | Non_unique | Key_name                  | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| zip_city |          1 | idx_is_point              |            1 | is_point            | A         |           2 |     NULL |   NULL |      | BTREE      |         |               | YES     | NULL       |
| zip_city |          1 | boundary                  |            1 | boundary            | A         |       34287 |       32 |   NULL |      | SPATIAL    |         |               | YES     | NULL       |
| zip_city |          1 | boundary_simplified       |            1 | boundary_simplified | A         |       34287 |       32 |   NULL |      | SPATIAL    |         |               | YES     | NULL       |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+

看起来完全一样,但是当我尝试使用

st_contains
运行查询时,它对它们的作用不同,例如:

mysql> SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
 AND          is_point = 0      LIMIT 1;
+-------+
| zip   |
+-------+
| 99901 |
+-------+
1 row in set (0.03 sec)
mysql> SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
 AND
         is_point = 0      LIMIT 1;
+-------+
| zip   |
+-------+
| 99901 |
+-------+
1 row in set (4.84 sec)

当我解释这两个查询时,我发现使用boundary_simplified的查询没有使用索引:

mysql> EXPLAIN SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
 AND
      is_point = 0      LIMIT 1;
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys         | key      | key_len | ref  | rows | filtered | Extra       |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | zip_city | NULL       | range | idx_is_point,boundary | boundary | 34      | NULL |    1 |    50.00 | Using where |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> EXPLAIN SELECT zip      FROM zip_city
      WHERE
          ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}')) 
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key          | key_len | ref   | rows  | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
|  1 | SIMPLE      | zip_city | NULL       | ref  | idx_is_point  | idx_is_point | 1       | const | 17143 |   100.00 | Using where |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

这方面有什么线索吗?我觉得我错过了一些简单的东西,但我找不到有关此的信息。另外,在创建索引时,对于

boundary
列需要~23.25秒,对于
boundary_simplified
只需要~0.75秒(这很奇怪。坐标会影响索引的效率吗?)

我尝试删除两个索引并分别创建它们,我测试了没有改变的索引的行为,当然,我尝试在查询中使用 FORCE INDEX 或 USE INDEX ,这导致了相同/更糟糕的行为。

编辑:由于 user1191247 的观察,我修复了显示的索引。另外,我没有显示完整的表格信息,因为它没有用。

mysql indexing spatial spatial-query spatial-index
1个回答
0
投票

感谢用户1191247的评论,我查找了他询问的信息,找到了这个:

| zip_city | CREATE TABLE `zip_city` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `state_id` int unsigned NOT NULL,
  `zip` mediumint(5) unsigned zerofill NOT NULL,
  `city` varchar(64) NOT NULL,
  `slug` varchar(64) NOT NULL,
  `location` point NOT NULL /*!80003 SRID 4326 */,
  `boundary` multipolygon NOT NULL /*!80003 SRID 4326 */,
  `is_point` tinyint unsigned NOT NULL DEFAULT '0',
  `fit_market` tinyint unsigned NOT NULL DEFAULT '0',
  `boundary_simplified` multipolygon NOT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_zip_to_city_state1_idx` (`state_id`),
  KEY `idx_zip` (`zip`),
  KEY `idx_slug` (`slug`),
  KEY `idx_city` (`city`),
  SPATIAL KEY `idx_location` (`location`),
  SPATIAL KEY `boundary` (`boundary`),
  SPATIAL KEY `boundary_simplified` (`boundary_simplified`),
  CONSTRAINT `fk_zip_to_city_state1` FOREIGN KEY (`state_id`) REFERENCES `state` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=41381 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |

,正如您所看到的,

boundary_simplified
缺少SRID定义,这对于索引正常工作至关重要(使用
SELECT DISTINCT ST_SRID(boundary_simplified) FROM zip_city;
我已经获得了SRID 4326,所以我不认为这是问题所在,但是列定义中缺少它)。我通过运行这些查询解决了这个问题:

DROP INDEX boundary_simplified ON zip_city;

ALTER TABLE zip_city MODIFY COLUMN boundary_simplified MULTIPOLYGON NOT NULL SRID 4326; 

(花了~53秒)

ALTER TABLE zip_city ADD SPATIAL INDEX idx_boundary_simplified (boundary_simplified); 

(现在大约需要 24 秒,这已经是好消息了)

然后 INDEX 完美运行:)

© www.soinside.com 2019 - 2024. All rights reserved.