R:聚合多个大型XML文件并将它们放入一个数据框中

问题描述 投票:0回答:1

我是R的新手,我尝试组合和解析几个XML元素。我导入了一个CSV,其中一列包含178个XML地址。 我想“获取”那些XML地址,将它们转换为一个大型XML文件并在数据框中解析它。最终我想将此数据框导出为CSV格式。

我已经安装了XML和XML2包。然后我按照教程尝试使用xmlTreeParse函数处理单个XML地址(http://ec.europa.eu/europeaid/files/iati/XI-IATI-EC_DEVCO_C_AG.xml)。 我还导入了178个地址的CSV。 但我不知道如何从我这里得到的数据框架。

# Install and load the necessary packages
library(XML)
library(xml2)

# Save the URL of the xml file in a variable
 xml.url <- "http://ec.europa.eu/europeaid/files/iati/XI-IATI-EC_DEVCO_C_AG.xml"

# Use the xmlTreePares-function to parse xml file directly from the web
 xmlfile <- xmlTreeParse(xml.url)

# The xml file is now saved as an object you can easily work with in R
class(xmlfile)

# Use the xmlRoot-function to access the top node
xmltop = xmlRoot(xmlfile)

# Have a look at the XML-code of the first subnodes
print(xmltop)[1:2]

# To extract the XML-values from the document, use xmlSApply
devcoafgh <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

 # Finally, get the data in a data-frame and have a look at the first rows and columns (PROBLEM)
 devcoafgh_df <- data.frame(t(devcoafgh),row.names=NULL)
 devcoafgh_df[1:5,1:4]

# Just 3 tests
print(devcoafgh)
print(xmlfile)

 write.csv(devcoafgh_df, file = "afghdata.csv")

# Tests done
# Import data containing all XML addresses
 xmladdresses <- read.csv("xml_addresses.csv")

# Create a variable with just the right column
xmlurls <- xmladdresses[c(5)]

# Save all URL's contained in this variable in new variables (178 in total)
xml.list <- (xmlurls)

最后,我期望有一个大型数据框来编译我能够解析和导出的178个XML文件。

r xml csv
1个回答
0
投票

我不确定这是否是您想要的,但对于您使用一个XML文件的示例,这将创建一个包含所有信息的tibble(如果缺少任何信息,则只填写NA

library(tidyverse)
devcoafgh_ldf <- lapply(devcoafgh,function(lst){
  tb <-enframe(unlist(lst))
  tb$name <- make.names(tb$name,unique=T)
  return(tb)
  })

devcoafgh_df <- devcoafgh_ldf %>% reduce(left_join,by="name") %>% gather(variable,value,-name) %>% spread(name, value)
© www.soinside.com 2019 - 2024. All rights reserved.