我最近一直在使用嵌套列表,并且能够从深层提取数据。我遇到了
tidyr
函数 hoist()
的小问题。我可以使用两个单独的命令提取 5 和 7 地址的纬度和经度。我想知道是否可以使用 hoist()
访问列表结构,以便提取 lat
和 lng
仅需要一个命令。这是例子:
library(tidyr)
library(dplyr)
library(repurrrsive)
gmaps_cities_o <- repurrrsive::gmaps_cities
gmaps_cities_o
输出:
A tibble:5 × 2
city json
<chr> <list>
Houston <list [2]>
Washington <list [2]>
New York <list [2]>
Chicago <list [2]>
Arlington <list [2]>
5 rows
要提取
lat
和lng
我必须编写两段代码:
# extract lat, long for the first address
gmaps_cities_o %>%
hoist(json,
lat = list("results", 1, "geometry", "location", "lat"),
lng = list("results", 1, "geometry", "location", "lng")
)
输出:
A tibble:5 × 4
city lat lng json
<chr> <dbl> <dbl> <list>
Houston 29.76043 -95.36980 <list [2]>
Washington 47.75107 -120.74014 <list [2]>
New York 40.71278 -74.00597 <list [2]>
Chicago 41.87811 -87.62980 <list [2]>
Arlington 32.73569 -97.10807 <list [2]>
5 rows
对于第二个地址:
# extract lat, long for the second address
gmaps_cities_o %>%
hoist(json,
lat = list("results", 2, "geometry", "location", "lat"),
lng = list("results", 2, "geometry", "location", "lng")
)
输出:
A tibble:5 × 4
city lat lng json
<chr> <dbl> <dbl> <list>
Houston NA NA <list [2]>
Washington 38.90719 -77.03687 <list [2]>
New York NA NA <list [2]>
Chicago NA NA <list [2]>
Arlington 38.87997 -77.10677 <list [2]>
5 rows
因此,需要两次单独的操作来获取 5 个城市的 7 个地址的
lat
和 lng
。
我可以用这段代码提取
lat
和lng
:
gmaps_cities_o %>%
unnest_wider(json) %>%
unnest_longer(results) %>%
hoist(results,
lat = list("geometry", "location", "lat"),
lng = list("geometry", "location", "lng")
) %>%
select(city, lat, lng)
输出:
A tibble:7 × 3
city lat lng
<chr> <dbl> <dbl>
Houston 29.76043 -95.36980
Washington 47.75107 -120.74014
Washington 38.90719 -77.03687
New York 40.71278 -74.00597
Chicago 41.87811 -87.62980
Arlington 32.73569 -97.10807
Arlington 38.87997 -77.10677
7 rows
但是我不能在一次操作中使用
hoist()
来完成它,这似乎不太正确。像这样的东西:
gmaps_cities_o %>%
hoist(json,
lat = list("results", (?), "geometry", "location", "lat"),
lng = list("results", (?), "geometry", "location", "lng")
)
有嵌套列表经验的人会给我提示吗?
谢谢。
如果您愿意使用基
rapply
(在列表上递归应用函数)而不是hoist
,您可以:
library(dplyr)
library(repurrrsive)
gmaps_cities_o |>
rowwise() |>
reframe(city = city,
prop_value = json |> rapply(f = \(x) x),
prop_name = names(prop_value)
) |>
filter(grepl('results\\.geometry\\.location', prop_name))
给出:
## # A tibble: 21 x 3
## city prop_value prop_name
## <chr> <chr> <chr>
## 1 Houston 29.7604267 results.geometry.location.lat
## 2 Houston -95.3698028 results.geometry.location.lng
## 3 Houston APPROXIMATE results.geometry.location_type
## 4 Washington 47.7510741 results.geometry.location.lat
## 5 Washington -120.7401386 results.geometry.location.lng
## 6 Washington APPROXIMATE results.geometry.location_type
## 7 Washington 38.9071923 results.geometry.location.lat
## 8 Washington -77.0368707 results.geometry.location.lng
## 9 Washington APPROXIMATE results.geometry.location_type
## 10 New York 40.7127753 results.geometry.location.lat
## # i 11 more rows
## # i Use `print(n = ...)` to see more rows
这是一个偷懒的方法。
p<-c()
for (i in 1:2){
p <-rbind(gmaps_cities_o %>%
hoist(json,
lat = list("results", i, "geometry", "location", "lat"),
lng = list("results", i, "geometry", "location", "lng")
),p)
}
p