我有一个包含一列 shapely.polygons 的 GeoDataFrame。其中一些是不同的,一些则不是:
In [1]: gdf
Out[2]:
geometry
1 POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))
2 POLYGON ((1 3, 1 4, 2 4, 2 3, 1 3))
3 POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))
4 POLYGON ((3 1, 3 2, 4 2, 4 1, 3 1))
5 POLYGON ((1 3, 1 4, 2 4, 2 3, 1 3))
我只需要找到不同的(不重叠)多边形:
In [1]: gdf_distinct
Out[2]:
geometry
1 POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))
2 POLYGON ((1 3, 1 4, 2 4, 2 3, 1 3))
4 POLYGON ((3 1, 3 2, 4 2, 4 1, 3 1))
由于多边形不可散列,我无法在 Pandas 中使用简单的方法:
In [1]: gdf_distinct = gdf['geometry'].unique()
TypeError: unhashable type: 'Polygon'
是否有任何简单有效的方法来获得仅包含不同多边形的新 GeoDataFrame?
附注:
我找到了一种方法,但它仅适用于完全重复的多边形,而且我认为效率不是很高:
In [1]: m = []
for index, row in gdf.iterrows():]
if row['geometry'] not in m:
m.append(row['geometry'])
gdf_distinct = GeoDataFrame(geometry=m)
让我们从 4 个多边形的列表开始,其中三个与其他多边形重叠:
from shapely.geometry import Polygon
import geopandas
polygons = [
Polygon([[1, 1], [1, 3], [3, 3], [3, 1], [1, 1]]),
Polygon([[1, 3], [1, 5], [3, 5], [3, 3], [1, 3]]),
Polygon([[2, 2], [2, 3.5], [3.5, 3.5], [3.5, 2], [2, 2]]),
Polygon([[3, 1], [3, 2], [4, 2], [4, 1], [3, 1]]),
]
gdf = geopandas.GeoDataFrame(data={'A': list('ABCD')}, geometry=polygons)
gdf.plot(column='A', alpha=0.75)
它们看起来像这样:
因此,我们可以循环遍历每个,然后循环遍历所有其他,并使用
shapely
API 检查是否有重叠。如果没有任何重叠,我们会将其附加到我们的输出列表中:
non_overlapping = []
for p in polygons:
overlaps = []
for g in filter(lambda g: not g.equals(p), polygons):
overlaps.append(g.overlaps(p))
if not any(overlaps):
non_overlapping.append(p)
任何给我的:
['POLYGON ((3 1, 3 2, 4 2, 4 1, 3 1))']
这正是我所期望的。
但这实际上是 O(N^2),而且我认为不一定如此。
所以我们尽量不要两次检查同一对:
non_overlapping = []
for n, p in enumerate(polygons[:-1], 1): # don't include the last element
overlaps = []
for g in polygons[n:]: # loop from the next element to the end
overlaps.append(g.overlaps(p))
if not any(overlaps):
non_overlapping.append(str(p))
我得到了相同的结果,而且在我的机器上速度更快了一点。
我们可以通过在
if
语句中使用生成器而不是普通的 for
块来稍微压缩循环:
non_overlapping = []
for n, p in enumerate(polygons[:-1], 1):
if not any(p.overlaps(g) for g in polygons[n:]):
non_overlapping.append(p)
同样的故事。
感谢@Paul H的精彩回答和@alphabetasoup的深思熟虑的评论。
虽然我的解决方案没有以不同的方式回答这个问题,但它是相关的。我的用例涉及仅查找重叠的多边形。为此,我做了一个小的代码修改,发现我需要包含最后一个元素,这样我就不会丢失其中一个重叠的多边形。
# Find polygons in a geopandas dataframe that overlap with another polygon
# in the same dataframe as well as non-overlapping polygons
overlapping = []
non_overlapping = []
for n, p in enumerate(list(gdf.geometry)[:], 1): # Included the last element
overlaps = []
for g in list(gdf.geometry)[n:]:
overlaps.append(g.overlaps(p))
if any(overlaps):
overlapping.append(p)
if not any(overlaps):
non_overlapping.append(p) # Did not store as string
我的用例还需要保留原始 geopandas 地理数据框中的其他列。我是这样做的:
overlapping = []
non_overlapping = []
for n, p in enumerate(list(gdf.geometry)[:], 0): # Used Pythonic zero-based indexing
if any(p.overlaps(g) for g in list(gdf.geometry)[n:]):
# Store the index from the original dataframe
overlapping.append(n)
if not any(p.overlaps(g) for g in list(gdf.geometry)[n:]):
non_overlapping.append(n)
# Create a new dataframes and reset their indexes
gdf_overlapping = gdf.iloc[overlapping]
gdf_overlapping.reset_index(drop=True, inplace=True)
gdf_non_overlapping = gdf.iloc[non_overlapping]
gdf_non_overlapping.reset_index(drop=True, inplace=True)