一种加快对geopandas.read_file（）的19个调用的方法？

问题描述投票：-1回答：1

[我有一些（其他人的）代码正在调用geopandas.read_file()以读取总计约2.7G的19个shapefile，这大约需要一分钟的时间来运行，我想知道是否有任何方法可以加快它的速度。

[我唯一想到的就是尝试异步加载19个shapefile，但似乎我必须派生Geopandas并创建自己的读取函数才能做到这一点。

有人会知道更简单的方法吗？

非常感谢。

python asynchronous geopandas

1个回答

0
投票

[如果使用geopandas在没有全局解释器锁（GIL）的C扩展中进行工作，则线程在读/处理周期和并行执行中会有一些重叠，可能会提高性能。我不认为这会很引人注目，但是值得一试]

import multiprocessing as mp
import multiprocessing.pool
import geopandas

files_to_read = ["foo", "bar", "baz"]

# guessing a max of 4 threads would be reasonable since much of read_file
# will likely be done in a C extension without the GIL
pool=mp.pool.ThreadPool(min(mp.cpu_count, len(files_to_read), 4))
frames = pool.map(geopandas.read_file, files_to_read, chunksize=1)
pool.close()

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.