我是R的新手,我尝试组合和解析几个XML元素。我导入了一个CSV,其中一列包含178个XML地址。 我想“获取”那些XML地址,将它们转换为一个大型XML文件并在数据框中解析它。最终我想将此数据框导出为CSV格式。
我已经安装了XML和XML2包。然后我按照教程尝试使用xmlTreeParse函数处理单个XML地址(http://ec.europa.eu/europeaid/files/iati/XI-IATI-EC_DEVCO_C_AG.xml)。 我还导入了178个地址的CSV。 但我不知道如何从我这里得到的数据框架。
# Install and load the necessary packages
library(XML)
library(xml2)
# Save the URL of the xml file in a variable
xml.url <- "http://ec.europa.eu/europeaid/files/iati/XI-IATI-EC_DEVCO_C_AG.xml"
# Use the xmlTreePares-function to parse xml file directly from the web
xmlfile <- xmlTreeParse(xml.url)
# The xml file is now saved as an object you can easily work with in R
class(xmlfile)
# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)
# Have a look at the XML-code of the first subnodes
print(xmltop)[1:2]
# To extract the XML-values from the document, use xmlSApply
devcoafgh <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
# Finally, get the data in a data-frame and have a look at the first rows and columns (PROBLEM)
devcoafgh_df <- data.frame(t(devcoafgh),row.names=NULL)
devcoafgh_df[1:5,1:4]
# Just 3 tests
print(devcoafgh)
print(xmlfile)
write.csv(devcoafgh_df, file = "afghdata.csv")
# Tests done
# Import data containing all XML addresses
xmladdresses <- read.csv("xml_addresses.csv")
# Create a variable with just the right column
xmlurls <- xmladdresses[c(5)]
# Save all URL's contained in this variable in new variables (178 in total)
xml.list <- (xmlurls)
最后,我期望有一个大型数据框来编译我能够解析和导出的178个XML文件。
我不确定这是否是您想要的,但对于您使用一个XML文件的示例,这将创建一个包含所有信息的tibble
(如果缺少任何信息,则只填写NA
)
library(tidyverse)
devcoafgh_ldf <- lapply(devcoafgh,function(lst){
tb <-enframe(unlist(lst))
tb$name <- make.names(tb$name,unique=T)
return(tb)
})
devcoafgh_df <- devcoafgh_ldf %>% reduce(left_join,by="name") %>% gather(variable,value,-name) %>% spread(name, value)