如何下载和/或提取存储在R中响应对象内的“原始”二进制zip对象中的数据?

问题描述 投票:0回答:2

我无法使用httr包从API请求中下载或读取zip文件。我是否可以尝试使用另一个软件包,使我可以下载/读取R中get请求的响应中存储的二进制zip文件?

我尝试了两种方法:

  1. 使用GET获取应用程序/ json类型的响应对象(成功),然后使用fromJSON通过content(my_response,'text')提取内容。输出包括名为“ zip”的列,这是我要下载的数据,该文档的状态为base64编码的二进制文件。该列当前是一个很长的随机字母字符串,我不确定如何将其转换为实际数据集。

  2. 我尝试使用fromJSON绕过,因为我注意到响应对象本身中有一个类'raw'的字段。该对象是一个随机数列表,我怀疑是数据集的二进制表示形式。我尝试使用rawToChar(my_response $ content)尝试将原始数据类型转换为字符,但这会导致生成与#1中相同的长字符串。

  3. [我注意到,使用方法#1,如果我使用base64_dec()尝试转换长字符串,我还将获得与响应对象本身中的“原始”字段相同类型的输出。
getzip1  <- GET(getzip1_link)
getzip1 # successful response, status 200
df <- fromJSON(content(getzip1, "text"))

df$status # "OK"
df$dataset$zip # <- this is the very long string of letters (eg. "I1NC5qc29uUEsBAhQDFA...")

# Method 1: try to convert from the 'zip' object in the output of fromJSON
try1 <- base64_dec(df$dataset$zip)
#looks similar to getzip1$content (i.e.  this produces the list of numbers/letters 50 4b 03 04 14 00, etc, perhaps binary representation)

# Method 2: try to get data directly from raw object
class(getzip1$content) # <- 'raw' class object directly from GET request
try2 <- rawToChar(getzip1$content) #returns same output as df$data$zip


我应该能够使用响应中的原始'content'对象或fromJSON输出的'zip'对象中的长字符串,以便查看数据集或以某种方式下载它。我不知道该怎么做。请帮助!

r get zip response httr
2个回答
0
投票

欢迎!

基于API的documentation,对getDataset端点的响应具有模式

数据集档案库,包括元信息,数据集本身经过base64编码以允许二进制ZIP转移。

{
 "status": "OK",
 "dataset": {
 "state_id": 5,
 "session_id": 1624,
 "session_name": "2019-2020 Regular Session",
 "dataset_hash": "1c7d77fe298a4d30ad763733ab2f8c84",
 "dataset_date": "2018-12-23",
 "dataset_size": 317775,
 "mime": "application\/zip",
 "zip": "MIME 64 Encoded Document"
 }
}

我们可以使用R通过以下代码来获取数据,

library(httr)
library(jsonlite)
library(stringr)
library(maditr)
token <- "" # Your API key
session_id <- 1253L # Obtained from the getDatasetList endpoint
access_key <- "2qAtLbkQiJed9Z0FxyRblu" # Obtained from the getDatasetList endpoint
destfile <- file.path("path", "to", "file.zip") # Modify
response <- str_c("https://api.legiscan.com/?key=",
                  token,
                  "&op=getDataset&id=",
                  session_id,
                  "&access_key=",
                  access_key) %>%
  GET()
status_code(x = response) == 200 # Good
body <- content(x = response,
                as = "text",
                encoding = "utf8") %>%
  fromJSON() # This contains some extra metadata
content(x = response,
        as = "text",
        encoding = "utf8") %>%
  fromJSON() %>%
  getElement(name = "dataset") %>%
  getElement(name = "zip") %>%
  base64_dec() %>%
  writeBin(con = destfile)
unzip(zipfile = destfile)

unzip将解压缩文件,在这种情况下看起来像]

hash.md5 # Can be checked again metadata
AL/2016-2016_1st_Special_Session/bill/*.json
AL/2016-2016_1st_Special_Session/people/*.json
AL/2016-2016_1st_Special_Session/vote/*.json

和往常一样,将代码包装在函数和利润中。

PS:这是代码在Julia中的比较方式。

using Base64, HTTP, JSON3, CodecZlib
token = "" # Your API key
session_id = 1253 # Obtained from the getDatasetList endpoint
access_key = "2qAtLbkQiJed9Z0FxyRblu" # Obtained from the getDatasetList endpoint
destfile = joinpath("path", "to", "file.zip") # Modify
response = string("https://api.legiscan.com/?",
                  join(["key=$token",
                        "op=getDataset",
                        "id=$session_id",
                        "access_key=$access_key"],
                        "&")) |>
    HTTP.get
@assert response.status == 200
JSON3.read(response.body) |>
    (content -> content.dataset.zip) |>
    base64decode |>
    (data -> write(destfile, data))
run(pipeline(`unzip`, destfile))

-1
投票

查看有关如何打开从URL下载的zip文件的答案


Getting a zip file with httr

© www.soinside.com 2019 - 2024. All rights reserved.