我有一些大的geojson(请参阅
California.geojson
有一个示例https://github.com/microsoft/USBuildingFootprints?tab=readme-ov-file#download-links)。我想转换为 .csv
(使用 \copy
在 Postgres 中导入速度更快)。我将其传递给 STDIN。
它们是
"type": "Polygon"
,但对于我的需求来说,这主要是点数据,我不需要完整的几何图形(也不需要属性)。 jq
非常适合这项任务。
可悲的是,一些最大的文件似乎太大而无法存储在内存中(该进程生成了“已杀死”消息)。我尝试了
--stream
论点,但我没能理解它,或者过程似乎很慢(超过3小时但仍在“运行”)。
可以通过以下方式制作样本(请参阅本文底部以获取其副本):
jq '.features = .features[:5]' data/Alabama.geojson > sample.geojson
这对于“较小的”geojson 非常有用(< 1.4 GB):
jq '.features | map(.geometry.coordinates) | map(.[]) | map(first) | .[] | {"long": first, "lat": last} | [.long, .lat] | @csv' small.geojson
但是我收到了一条“已杀死”消息(我假设我的内存不足)
然后我尝试了
--stream
,我不确定我是否理解正确(这个post和这个issue有很大帮助)
这是我使用 --stream 的版本(很多“黑客”)
cat sample.geojson | jq --stream "fromstream(1|truncate_stream(inputs))" | jq ' map(.geometry.coordinates) | map(.[]) | map(first) | .[] | {"long": first, "lat": last} | [.long, .lat] | @csv'
它适用于sample.geojson,但在大型geojson(例如“Ohio.geojson”)上失败。有什么想法吗?
我也尝试写入文件,但没有成功。
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-84.959634,
32.421887
],
[
-84.95982,
32.421889
],
[
-84.959822,
32.421797
],
[
-84.959767,
32.421796
],
[
-84.959767,
32.421771
],
[
-84.959636,
32.421769
],
[
-84.959634,
32.421887
]
]
]
},
"properties": {
"release": 2,
"capture_dates_range": "3/26/2020-7/22/2020"
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-84.959636,
32.42095
],
[
-84.959715,
32.42095
],
[
-84.959714,
32.420984
],
[
-84.959816,
32.420985
],
[
-84.959818,
32.420849
],
[
-84.959637,
32.420848
],
[
-84.959636,
32.42095
]
]
]
},
"properties": {
"release": 2,
"capture_dates_range": "3/26/2020-7/22/2020"
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-84.959998,
32.235231
],
[
-84.959877,
32.235231
],
[
-84.959877,
32.235288
],
[
-84.959998,
32.235288
],
[
-84.959998,
32.235231
]
]
]
},
"properties": {
"release": 1,
"capture_dates_range": ""
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-84.960253,
32.422248
],
[
-84.960069,
32.422245
],
[
-84.960067,
32.422321
],
[
-84.960165,
32.422323
],
[
-84.960164,
32.422364
],
[
-84.96025,
32.422365
],
[
-84.960253,
32.422248
]
]
]
},
"properties": {
"release": 2,
"capture_dates_range": "3/26/2020-7/22/2020"
}
},
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-84.961602,
32.419206
],
[
-84.961599,
32.419354
],
[
-84.961707,
32.419355
],
[
-84.961708,
32.419291
],
[
-84.961794,
32.419292
],
[
-84.961796,
32.419208
],
[
-84.961602,
32.419206
]
]
]
},
"properties": {
"release": 2,
"capture_dates_range": "3/26/2020-7/22/2020"
}
}
]
}
您的原始过滤器可以简化如下:
.features
保持原样map(.geometry.coordinates) | map(.[]) | map(first)
实际上只是map(.geometry.coordinates | .[] | first)
map(…) | .[]
然后可以减少为 .[] | …
{"long": first, "lat": last} | [.long, .lat]
构建一个对象并立即将其转换为数组。这可以简化为 [first,last]
但由于数组只有两个项目开始,这部分实际上只是返回输入,并且可以完全删除@csv
保持原样总而言之,您可以使用以下方法实现相同的效果
.features[].geometry.coordinates[][0] | @csv
这会深入五个级别,然后选择第一个项目,并将其转换为 CSV 输出。所以,这可以翻译成
--stream
版本,如下:
fromstream(5|truncate_stream(inputs))[0] | @csv
-114.127454,34.265674
-114.127694,34.260939
-114.127988,34.264977
-114.129007,34.260229
-114.129611,34.261105
-114.130311,34.263922
-114.131834,34.284069
-114.132183,34.28509
-114.132634,34.281492
-114.133764,34.282816
: