我正在尝试使用PostGIS找到事件(多边形)和监视区(圆圈 - 点和半径)之间的交集。基线数据将超过10 000多个多边形和50万个圆。另外,我对PostGIS很新。
我尝试过一些东西,但执行时间很长。有人可以建议任何优化或更好的方式使用PostGIS。这是我试过的 -
1.使用Geometry数据类型:我已将事件和监视区存储在类型几何中。在它们上创建了GIST索引,使用ST_DWITHIN查找交集。
1次事件和500 000个手表区的输出量为6.750秒。在这里,所花费的时间是最佳的,但问题是我有半径米,几何类型ST_DWithin要求它在SRID单位。我无法弄清楚这种转换。
CREATE TABLE incident (
incident_id SERIAL NOT NULL,
incident_name VARCHAR(20),
incident_span GEOMETRY(POLYGON, 4326),
CONSTRAINT incident_id PRIMARY KEY (incident_id)
);
CREATE TABLE watchzones (
id SERIAL NOT NULL,
date_created timestamp with time zone DEFAULT now(),
latitude NUMERIC(10, 7) DEFAULT NULL,
Longitude NUMERIC(10, 7) DEFAULT NULL,
radius integer,
position GEOMETRY(POINT, 4326),
CONSTRAINT id PRIMARY KEY (id)
);
CREATE INDEX ix_spatial_geom on watchzones using gist(position);
CREATE INDEX ix_spatial_geom_1 on incident using gist(incident_span);
Insert into incident values (
1,
'test',
ST_GeomFromText('POLYGON((152.945470916 -29.212227933,152.942130026 -29.213431145,152.939345911 -29.2125423759999,152.935144791 -29.21454003,152.933185494 -29.2135838469999,152.929481762 -29.216065516,152.929698621 -29.217402937,152.927245999
-29.219576,152.921539 -29.217676,152.918487996 -29.2113786959999,152.919254355 -29.206029929,152.919692387 -29.2027824419999,152.936020197 -29.207567346,152.944901258 -29.207729953,152.945470916
-29.212227933))',
4326
)
);
insert into watchzones
SELECT generate_series(1, 500000) AS id,
now(),
-29.21073,
152.93322,
'50',
ST_GeomFromText('POINT( 152.93322 -29.21073)', 4326);
explain analyze SELECT wz.id,
i.incident_id
FROM watchzones wz,
incident i
WHERE ST_DWithin(incident_span,position,wz.radius);
"Nested Loop (cost=0.14..227467.00 rows=42 width=8) (actual time=0.142..1506.476 rows=500000 loops=1)"
" -> Seq Scan on watchzones wz (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.109..47.822 rows=500000 loops=1)"
" -> Index Scan using ix_spatial_geom_1 on incident i (cost=0.14..0.42 rows=1 width=284) (actual time=0.002..0.002 rows=1 loops=500000)"
" Index Cond: (incident_span && st_expand(wz."position", (wz.radius)::double precision))"
" Filter: ((wz."position" && st_expand(incident_span, (wz.radius)::double precision)) AND _st_dwithin(incident_span, wz."position", (wz.radius)::double precision))"
"Planning time: 0.150 ms"
"Execution time: 1523.312 ms"
2.使用地理数据类型:
这里有1个事件和50万个监视区的输出,大概是29.987秒,非常慢。请注意,我已经尝试了GIST和BRIN索引,并在表格上运行了VACUUM ANALYZE。
CREATE TABLE watchzones_geog
(
id SERIAL PRIMARY KEY,
date_created TIMESTAMP with time zone DEFAULT now(),
latitude NUMERIC(10, 7) DEFAULT NULL,
longitude NUMERIC(10, 7) DEFAULT NULL,
radius INTEGER,
position geography(point)
);
CREATE INDEX watchzones_geog_gix ON watchzones_geog USING GIST (position);
insert into watchzones_geog
SELECT generate_series(1,500000) AS id, now(),-29.21073,152.93322,'50',ST_GeogFromText('POINT(152.93322 -29.21073)');
CREATE TABLE incident_geog (
incident_id SERIAL PRIMARY KEY,
incident_name VARCHAR(20),
incident_span GEOGRAPHY(POLYGON)
);
CREATE INDEX incident_geog_gix ON incident_geog USING GIST (incident_span);
Insert into incident_geog values (1,'test', ST_GeogFromText
('POLYGON((152.945470916 -29.212227933,152.942130026 -29.213431145,152.939345911 -29.2125423759999,152.935144791 -29.21454003,152.933185494 -29.2135838469999,152.929481762 -29.216065516,152.929698621 -29.217402937,152.927245999
-29.219576,152.921539 -29.217676,152.918487996 -29.2113786959999,152.919254355 -29.206029929,152.919692387 -29.2027824419999,152.936020197 -29.207567346,152.944901258 -29.207729953,152.945470916
-29.212227933))'));
explain analyze SELECT i.incident_id,
wz.id
FROM watchzones_geog wz,
incident_geog i
WHERE St_dwithin(position, incident_span, radius);
"Nested Loop (cost=0.27..348717.00 rows=17 width=8) (actual time=0.277..18551.844 rows=500000 loops=1)"
" -> Seq Scan on watchzones_geog wz (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.102..50.052 rows=500000 loops=1)"
" -> Index Scan using incident_geog_gix on incident_geog i (cost=0.27..0.67 rows=1 width=711) (actual time=0.036..0.036 rows=1 loops=500000)"
" Index Cond: (incident_span && _st_expand(wz."position", (wz.radius)::double precision))"
" Filter: ((wz."position" && _st_expand(incident_span, (wz.radius)::double precision)) AND _st_dwithin(wz."position", incident_span, (wz.radius)::double precision, true))"
"Planning time: 0.155 ms"
"Execution time: 18587.041 ms"
3.我也尝试使用ST_Buffer(position, radius,'quad_segs=8')
创建一个圆,然后使用ST_Intersects。这样,对于几何和地理数据类型,查询花费的时间超过一分钟。
如果有人可以建议更好的方法或一些可以加快执行的优化,那将会很棒。
谢谢
查询很好,但你的样本是错误的。首先,让我们注意,针对1个多边形优化的查询可能与数千个优化的查询不同。
主要问题是样本点。实际上,在完全相同的位置有500,000个点,因此根据相交的多边形,查询将返回0或500 000个结果。 Postgis首先使用索引使用方框交叉点/多边形,然后通过计算真实距离来细化结果。使用您的样本,它必须计算距离500,000次,这是缓慢的。
使用具有随机位置(1度以内)的点图层,查询所需的时间不到1秒,因为它只需计算20个位置的距离。
INSERT INTO watchzones_geog
SELECT generate_series(1,500000) AS id, now(),0,0,'50',
ST_makePoint(152.93322+random(),-29.21073+random())::geography;
explain analyze SELECT i.incident_id,
wz.id
FROM watchzones_geog wz,
incident_geog i
WHERE St_dwithin(position, incident_span, radius);
Nested Loop (cost=0.00..272424.01 rows=1 width=8) (actual time=25.956..921.846 rows=20 loops=1)
--------------------------------------------
Join Filter: ((wz."position" && _st_expand(i.incident_span, (wz.radius)::double precision)) AND (i.incident_span && _st_expand(wz."position", (wz.radius)::double precision)) AND _st_dwithin(wz."position", i.incident_span, (wz.radius)::double precision, true))
Rows Removed by Join Filter: 499980
-> Seq Scan on incident_geog i (cost=0.00..1.01 rows=1 width=36) (actual time=0.009..0.009 rows=1 loops=1)
-> Seq Scan on watchzones_geog wz (cost=0.00..11173.00 rows=500000 width=40) (actual time=0.006..65.625 rows=500000 loops=1)
Planning time: 1.887 ms
Execution time: 921.895 ms