我正在自动化从 Internet 从 KML 下载和提取数据的过程。我试图提取的值在一个大字符串中,我无法弄清楚如何获取单个需要的值。
basins <- c('CARSON - CRSN CITY L (STWN2LLF)',
'CARSON - CRSN CITY L (STWN2LUF)',
'EF CARSON - GRDNVL L (GRDN2LLF)',
'EF CARSON - GRDNVL L (GRDN2LUF)',
'EF CARSON-MRKLEEVLLE (CEMC1HLF)',
'EF CARSON-MRKLEEVLLE (CEMC1HUF)',
'WF CARSON - WOODFRDS (WOOC1HOF)')
date <- c('Mar_04_2023')
url_upper <- paste('https://www.cnrfc.noaa.gov/archive/sweBasins/SWEbasinsVal_', date, ".kml", sep = "")
kml_upper <- st_read(url_upper)
kml_upper <- subset(kml_upper, kml_upper$Name %in% basins)
kml_upper$geometry <- NULL
head(kml_upper,3)
我需要提取 43.81 英寸,它位于“...2023 年 3 月 4 日的模拟盆地雪水当量”之后。我无法添加特定文本,因为 SO 将其识别为格式。
提取数据的最佳方法是什么?
看起来您请求的信息作为 HTML 表存储在 xml/xml 代码中。我相信 sf 有办法提取信息。
这不是那样的。我正在使用 xml2 库从 XML 中提取 HTML。然后我用 rvest 将文本转换成 HTML,将表格提取成可用的形式。
在你的盆地列表中,我只能在下载的文件中找到 3 个。
library(dplyr)
library(rvest)
library(xml2)
page <- xml2::read_xml(url_upper)
#xml2::xml_ns(page)
xml_ns_strip(page) #strip the name space
#find all of the placemarks and extract out
places <- page %>% xml_find_all(".//Placemark")
#get the names
namesOfPlaces <- places %>% xml_find_first(".//name") %>% xml_text()
#find the places from the names o which are in basin list (3 in this case)
placesOfInterest <- which(namesOfPlaces %in% basins)
descriptionsInPlaces <- places %>% xml_find_all(".//description") %>% xml_text()
#reduce descriptions down to ones of interest
descriptionsInPlaces <- descriptionsInPlaces[placesOfInterest]
#loop through the list extracting the desired information
answer <- lapply(descriptionsInPlaces, function(node){
convertHTML <- read_html(node)
output <- convertHTML %>% html_elements("table") %>% html_table()
})
names(answer) <- namesOfPlaces[placesOfInterest]
#a list of table with the requested information
这是结果数据。我留给读者提取列表中每个数据框的第一行。
answer
$`EF CARSON-MRKLEEVLLE (CEMC1HUF)`
$`EF CARSON-MRKLEEVLLE (CEMC1HUF)`[[1]]
# A tibble: 3 × 2
X1 X2
<chr> <chr>
1 Simulated Basin Snow Water Equivalent for Mar 04, 2023: 43.81 in.
2 SWE Percent of Normal: 207%
3 Average month-to-date SWE through Mar 04, 2023: 21.12 in.
$`EF CARSON - GRDNVL L (GRDN2LUF)`
$`EF CARSON - GRDNVL L (GRDN2LUF)`[[1]]
# A tibble: 3 × 2
X1 X2
<chr> <chr>
1 Simulated Basin Snow Water Equivalent for Mar 04, 2023: 16.07 in.
2 SWE Percent of Normal: 429%
3 Average month-to-date SWE through Mar 04, 2023: 3.75 in.
$`CARSON - CRSN CITY L (STWN2LUF)`
$`CARSON - CRSN CITY L (STWN2LUF)`[[1]]
# A tibble: 3 × 2
X1 X2
<chr> <chr>
1 Simulated Basin Snow Water Equivalent for Mar 04, 2023: 15.26 in.
2 SWE Percent of Normal: 307%
3 Average month-to-date SWE through Mar 04, 2023: 4.97 in.