如何使用 Data Science Toolbox 对简单地址进行地理编码

问题描述 投票:0回答:3

我厌倦了谷歌的地理编码,并决定尝试替代方案。数据科学工具包 (http://www.datasciencetoolkit.org) 允许您对无限数量的地址进行地理编码。 R 有一个优秀的包,可以作为其函数的包装器(CRAN:RDSTK)。该软件包有一个名为

street2coordinates()
的函数,可与数据科学工具包的地理编码实用程序交互。

但是,如果您尝试对诸如

城市、国家/地区
之类的简单内容进行地理编码,RDSTK 功能street2coordinates() 将不起作用。在下面的示例中,我将尝试使用该函数来获取凤凰城的纬度和经度:

> require("RDSTK")
> street2coordinates("Phoenix+Arizona+United+States")
[1] full.address
<0 rows> (or 0-length row.names)

数据科学工具包中的实用程序运行良好。这是给出答案的 URL 请求: http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address=Phoenix+Arizona+United+States

我对对多个地址进行地理编码(完整的地址和城市名称)感兴趣。我知道数据科学工具包 URL 会很好用。

如何与 URL 交互并将多个纬度和经度获取到包含地址的数据框中?

这是一个示例数据集:

dff <- data.frame(address=c(
  "Birmingham, Alabama, United States",
  "Mobile, Alabama, United States",
  "Phoenix, Arizona, United States",
  "Tucson, Arizona, United States",
  "Little Rock, Arkansas, United States",
  "Berkeley, California, United States",
  "Duarte, California, United States",
  "Encinitas, California, United States",
  "La Jolla, California, United States",
  "Los Angeles, California, United States",
  "Orange, California, United States",
  "Redwood City, California, United States",
  "Sacramento, California, United States",
  "San Francisco, California, United States",
  "Stanford, California, United States",
  "Hartford, Connecticut, United States",
  "New Haven, Connecticut, United States"
  ))
r maps geocoding
3个回答
16
投票

像这样:

library(httr)
library(rjson)

data <- paste0("[",paste(paste0("\"",dff$address,"\""),collapse=","),"]")
url  <- "http://www.datasciencetoolkit.org/street2coordinates"
response <- POST(url,body=data)
json     <- fromJSON(content(response,type="text"))
geocode  <- do.call(rbind,sapply(json,
                                 function(x) c(long=x$longitude,lat=x$latitude)))
geocode
#                                                long      lat
# San Francisco, California, United States -117.88536 35.18713
# Mobile, Alabama, United States            -88.10318 30.70114
# La Jolla, California, United States      -117.87645 33.85751
# Duarte, California, United States        -118.29866 33.78659
# Little Rock, Arkansas, United States      -91.20736 33.60892
# Tucson, Arizona, United States           -110.97087 32.21798
# Redwood City, California, United States  -117.88536 35.18713
# New Haven, Connecticut, United States     -72.92751 41.36571
# Berkeley, California, United States      -122.29673 37.86058
# Hartford, Connecticut, United States      -72.76356 41.78516
# Sacramento, California, United States    -121.55541 38.38046
# Encinitas, California, United States     -116.84605 33.01693
# Birmingham, Alabama, United States        -86.80190 33.45641
# Stanford, California, United States      -122.16750 37.42509
# Orange, California, United States        -117.85311 33.78780
# Los Angeles, California, United States   -117.88536 35.18713

这利用了 street2coordinates API 的 POST 接口(在此处记录),该接口在 1 个请求中返回所有结果,而不是使用多个 GET 请求。

Phoenix 的缺失似乎是 street2coordinates API 中的一个错误。如果您进入 API 演示页面 并尝试“美国亚利桑那州菲尼克斯”,您会得到空响应。但是,正如您的示例所示,使用他们的“Google 式地理编码器”确实给出了 Phoenix 的结果。因此,这里有一个使用重复 GET 请求的解决方案。请注意,这运行速度慢很多

geo.dsk <- function(addr){ # single address geocode with data sciences toolkit
  require(httr)
  require(rjson)
  url      <- "http://www.datasciencetoolkit.org/maps/api/geocode/json"
  response <- GET(url,query=list(sensor="FALSE",address=addr))
  json <- fromJSON(content(response,type="text"))
  loc  <- json['results'][[1]][[1]]$geometry$location
  return(c(address=addr,long=loc$lng, lat= loc$lat))
}
result <- do.call(rbind,lapply(as.character(dff$address),geo.dsk))
result <- data.frame(result)
result
#                                     address         long        lat
# 1        Birmingham, Alabama, United States   -86.801904  33.456412
# 2            Mobile, Alabama, United States   -88.103184  30.701142
# 3           Phoenix, Arizona, United States -112.0733333 33.4483333
# 4            Tucson, Arizona, United States  -110.970869  32.217975
# 5      Little Rock, Arkansas, United States   -91.207356  33.608922
# 6       Berkeley, California, United States   -122.29673  37.860576
# 7         Duarte, California, United States  -118.298662  33.786594
# 8      Encinitas, California, United States  -116.846046  33.016928
# 9       La Jolla, California, United States  -117.876447  33.857515
# 10   Los Angeles, California, United States  -117.885359  35.187133
# 11        Orange, California, United States  -117.853112  33.787795
# 12  Redwood City, California, United States  -117.885359  35.187133
# 13    Sacramento, California, United States  -121.555406  38.380456
# 14 San Francisco, California, United States  -117.885359  35.187133
# 15      Stanford, California, United States    -122.1675   37.42509
# 16     Hartford, Connecticut, United States   -72.763564   41.78516
# 17    New Haven, Connecticut, United States   -72.927507  41.365709

5
投票

ggmap 包 支持使用 Google 或 Data Science Toolkit 进行地理编码,后者使用“Google 风格的地理编码器”。正如前面的答案中所述,对于多个地址来说,这非常慢。

library(ggmap)
result <- geocode(as.character(dff[[1]]), source = "dsk")
print(cbind(dff, result))
#                                     address        lon      lat
# 1        Birmingham, Alabama, United States  -86.80190 33.45641
# 2            Mobile, Alabama, United States  -88.10318 30.70114
# 3           Phoenix, Arizona, United States -112.07404 33.44838
# 4            Tucson, Arizona, United States -110.97087 32.21798
# 5      Little Rock, Arkansas, United States  -91.20736 33.60892
# 6       Berkeley, California, United States -122.29673 37.86058
# 7         Duarte, California, United States -118.29866 33.78659
# 8      Encinitas, California, United States -116.84605 33.01693
# 9       La Jolla, California, United States -117.87645 33.85751
# 10   Los Angeles, California, United States -117.88536 35.18713
# 11        Orange, California, United States -117.85311 33.78780
# 12  Redwood City, California, United States -117.88536 35.18713
# 13    Sacramento, California, United States -121.55541 38.38046
# 14 San Francisco, California, United States -117.88536 35.18713
# 15      Stanford, California, United States -122.16750 37.42509
# 16     Hartford, Connecticut, United States  -72.76356 41.78516
# 17    New Haven, Connecticut, United States  -72.92751 41.36571

0
投票

要与 Data Science Toolkit 的地理编码 API 交互并检索多个地址的纬度和经度,您可以使用 R 中的

httr
包发出 HTTP 请求并处理 API 响应。这是分步指南:

  1. 首先,安装并加载必要的软件包:
install.packages(c("httr", "jsonlite"))
library(httr)
library(jsonlite)
  1. 定义一个函数来发出 API 请求并提取给定地址的纬度和经度:
geocode <- function(address) {
  url <- "http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address="
  url <- paste0(url, address)
  response <- GET(url)
  result <- content(response, as="parsed")
  if (is.null(result$results)) {
    return(NULL)
  }
  return(c(result$results[[1]]$formatted_address, result$results[[1]]$geometry$location$lat, result$results[[1]]$geometry$location$lng))
}
  1. 使用
    purrr
    map()
    函数将
    geocode()
    函数应用于数据框中的每个地址,并将结果收集在列表中:
library(purrr)

results <- map(dff$address, geocode)
  1. 将结果列表转换为数据框:
df_results <- data.frame(address = unlist(results[1:length(results)]),
                         latitude = unlist(results[length(results) + 1:(2 * length(results))]),
                         longitude = unlist(results[(2 * length(results)) + 1:(3 * length(results))]))
  1. 删除任何缺失值的行:
df_results <- df_results[complete.cases(df_results), ]

现在,

df_results
包含示例数据集中每个位置的地址、纬度和经度。您可以将此答案发布到 Stack Overflow 上,与社区分享您的解决方案。

© www.soinside.com 2019 - 2024. All rights reserved.