我一直在通过简单的空间查询来测试几何图形并将其与Postgis进行比较。例如,此SQL查询在Postgis中运行30秒:
with series as (
select
generate_series(0, 5000) as i
),
points as (
select ST_Point(i, i*2) as geom from series
)
select st_distance(a.geom, b.geom) from points as a, points as b
现在,以下geomesa版本需要5分钟(使用-Xmx10g :)]]
import org.apache.spark.sql.SparkSession import org.locationtech.geomesa.spark.jts._ import org.locationtech.jts.geom._ object HelloWorld { def main(args: Array[String]): Unit = { val spark = SparkSession.builder .config("spark.sql.crossJoin.enabled", "true") .config("spark.executor.memory", "12g") .config("spark.driver.memory", "12g") .config("spark.cores.max", "4") .master("local") .appName("Geomesa") .getOrCreate() spark.withJTS import spark.implicits._ val x = 0 until 5000 val y = for (i <- x) yield i*2 val coords = for ((i, n) <- x.zipWithIndex) yield (i, y(n)) val points = for (i <- coords) yield new GeometryFactory().createPoint(new Coordinate(i._1, i._2)) val points2 = for (i <- coords) yield new GeometryFactory().createPoint(new Coordinate(i._1, i._2)) val all_points = for { i <- points j <- points2} yield (i, j) val df = all_points.toDF("point", "point2") val df2 = df.withColumn("dist", st_distance($"point", $"point2")) df2.show() } }
我原本希望从Geomesa获得类似或更好的性能,如何调整这样的查询呢?
我一直在通过简单的空间查询来测试几何图形并将其与Postgis进行比较。例如,此SQL查询在Postgis中以30秒的时间运行:系列为(选择串串(0,5000)为i),...
GeoMesa对于少量数据不会像PostGIS那样快。 GeoMesa专为分布式NoSQL数据库而设计。如果您的数据集适合PostGIS,则可能应该只使用PostGIS。一旦开始达到PostGIS的极限,就应该考虑使用GeoMesa。 GeoMesa确实提供了与任意GeoTools数据存储(包括PostGIS)的集成,这可以使PostGIS可以使用某些GeoMesa Spark和command-line功能。