从一个简单的JSON中提取整洁的数据是很简单的,使用了 tidyjson
包 (https:/cran.r-project.orgwebpackagestidyjsonvignettesintroduction-tidyjson.html。)
我一直没能把这个逻辑应用到复杂的嵌套JSON结构中。类似这样的问题(在R中如何从嵌套的json数据中提取数据?)太过特殊,所以我无法将其推断为其他情况。
一个更普遍的情况可以由这个结构给出(参见这里的工作可重现的例子。1.4 示例请求。https:/www.ree.esenapidatos)
{
"data": {
"type": "WIDGET TYPE",
"id": "WIDGET_ID",
"attributes": {
"title": "WIDGET NAME",
"last-update": "2019-02-01T08:26:34.000+01:00",
"description": "WIDGET DESCRIPTION",
},
"meta": {
"cache-control": {
"cache": "HIT",
"expireAt": "2019-03-01T17:18:22"
}
}
},
"included": [
{
"type": "INDICATOR_1 TYPE",
"id": "INDICADOR_1_ID",
"groupId": null,
"attributes": {
"title": "INDICADOR_1 NAME",
"description": "INDICADOR_1 DESCRIPTION",
"color": "#2fa688",
"type": "INDICADOR_1 TYPE",
"magnitude": "INDICADOR_1 MAGNITUDE",
"composite": false,
"last-update": "2019-02-19T08:26:34.000+01:00",
"values": [
{
"value": 12345,
"percentage": "VALUE BETWEEN 0 AND 1",
"datetime": "2019-02-04T20:44:00.000+01:00"
}
]
},
{
"type": "INDICATOR_2 TYPE",
"id": "INDICADOR_1_ID",
"groupId": null,
"attributes": {
…
}
}
]
}
}
第一层有一个对象 "data"
和数组 "included"
该 "included"
数组中每个指标有一个对象
在这些对象中,每个对象都有一个 "attributes"
设有 "values"
最终数据所在的数组。"value"
, "percentage"
和 "datetime"
我们的目标是将数据提取到一个整洁的数据框架中,列为 "type"
, "title"
, "value"
, "percentage"
和 "datetime"
我是经过一番试错才发现的。我自己回答,万一能帮到其他出发的人呢?json
对象。
library(tidyjson)
json %>% # our json object
enter_object(included) %>% # to enter the object where the data are
gather_array() %>% spread_all() %>% # to work with the array
select(attributes.title) %>% # to maintain this variable
enter_object(values) %>% # to enter the array where the final data are
gather_array() %>% spread_all() %>% # same as before to work with the array
select(indicator = attributes.title, value, percentage, datetime) # select final data
基本上是一样的过程 enter_object %>% gather_array %>% spread_all %>% select
重复两次。你只需说出每层要输入的对象和你要选择的信息片段。