使用Ruby的Nokogiri XML大规模编辑

问题描述 投票:0回答:1

我正在使用Nokogiri将相当大的XML文件转换为CSV格式。我的客户有8万多行。我有2个问题

第一次询问。我需要将Node大规模编辑为类似这样的内容(www.mybaseurl.com +<ImageFile />的文本)。这样,它可能具有完整的图像路径。我直截了当地看了他们的所有文件。我似乎仍然找不到解决我问题的方法。 StackOverFlow也是如此

第二次询问。我想让ruby检查该列是否为空,是否为空;我需要代码在下面用相同的[[handle value创建一个新行,但是<AltImageFile1> for <ImageFile />

的值有点像这样enter image description here

这里是我正在使用的XML文件的示例:

<Products> <Product> <Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name> <Description>This single bit curved grip axe handle is made for 3 to 5 pound axes. A good quality replacement handle made of American hickory with a natural wax finish. Hardwood handles do not conduct electricity and American Hickory is known for its strength, elasticity and ability to absorb shock. These handles provide exceptional value and economy for homeowners and other occasional use applications. Each Link handle comes with the required wedges, rivets, or epoxy needed for proper application of the tool head.</Description> <ImageFile>100024.jpg</ImageFile> <AltImageFile1>103387-1.jpg</AltImageFile1> <ItemNumber>100024</ItemNumber> <ModelNumber>64707</ModelNumber> </Product> <Product> <Name>1-1/4-Inch Lavatory Pop Up Assembly</Name> <Description>Classic chrome finish with ABS plastic top &amp; body includes push rod, no overflow.</Description> <ImageFile>100024.jpg</ImageFile> <AltImageFile1>103429-1.jpg</AltImageFile1> <ItemNumber>100024</ItemNumber> <ModelNumber>64707</ModelNumber> </Product> <Product> <Name>30-Inch Belt-Drive Whole-House Attic Fan With Shutter</Name> <Description>The 30" belt drive whole house fan (5700 CFM) with automatic shutter helps cool living spaces up to 1900 square feet. It runs on high &amp; low and a 2 speed wall switch is included. The automatic shutter is white. It needs 1095 square inches of open exhaust vents in attic space, with a rough opening of 34-1/4" x 29". You do have to cut joist when installing fan, with the motor mounted on struts above housing. The fan will be quieter than direct drive models. There is a 10 year limited parts warranty, 5 year limited labor warranty.</Description> <ImageFile>100073.jpg</ImageFile> <AltImageFile1> <ItemNumber>100024</ItemNumber> <ModelNumber>64707</ModelNumber> </Product> </Products>

这是我的红宝石代码;我该如何改善。我将不胜感激任何解释。

require 'csv' require 'nokogiri' xml = File.read('Desktop/roduct_catalog.xml') doc = Nokogiri::XML(xml) all_the_things = [] doc.xpath('//Products/Product').each do |file| handle = file.xpath("./ItemNumber").first.text title = file.xpath("./Name").first.text description = file.xpath("./Description").first.text collection = file.xpath("./FLDeptName").first.text image1 = file.xpath("./ImageFile").first.text all_the_things << [ handle, title, description, collection, image1] end CSV.open('product_file_1.csv', 'wb' ) do |row| row << [ 'handle', 'title', 'description', 'collection', 'image1'] all_the_things.each do |data| row << data end end

ruby xml csv xpath nokogiri
1个回答
0
投票
这是您可以尝试的代码。我在xml中看不到FLDeptName节点,因此注释了与该节点相关的行。

require 'csv' require 'nokogiri' xml = File.read('roduct_catalog.xml') doc = Nokogiri::XML(xml) all_the_things = [] doc.xpath('//Products/Product').each do |file| handle = file.xpath("./ItemNumber").first.text title = file.xpath("./Name").first.text description = file.xpath("./Description").first.text # collection = file.xpath("./FLDeptName").first.text #<== commented because as ./FLDeptName node not present image1 = "www.mybaseurl.com/" + file.xpath("./ImageFile").first.text # all_the_things << [ handle, title, description, collection, image1]#<== commented because as ./FLDeptName node not present all_the_things << [handle, title, description, image1] end CSV.open('product_file_1.csv', 'wb') do |row| # row << [ 'handle', 'title', 'description','collection' 'image1'] #<== commented because as ./FLDeptName node not present row << ['handle', 'title', 'description', 'image1'] all_the_things.each do |data| row << data end end

这里是输出。enter image description here

Edit 1:根据评论。

带有2个图像的XML示例

<?xml version="1.0"?> <ProductCatalogImport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <Products> <Product> <Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name> <Description>This single bit curved grip axe handle is made for 3 to 5 pound axes. A good quality replacement handle made of American hickory with a natural wax finish. Hardwood handles do not conduct electricity and American Hickory is known for its strength, elasticity and ability to absorb shock. These handles provide exceptional value and economy for homeowners and other occasional use applications. Each Link handle comes with the required wedges, rivets, or epoxy needed for proper application of the tool head. </Description> <ImageFile>100024.jpg</ImageFile> <ImageFile2>100024-2.jpg</ImageFile2> <ItemNumber>100024</ItemNumber> <ModelNumber>64707</ModelNumber> </Product> <Product> <Name>1-1/4-Inch Lavatory Pop Up Assembly</Name> <Description>Classic chrome finish with ABS plastic top &amp; body includes push rod, no overflow.</Description> <ImageFile>100024.jpg</ImageFile> <ItemNumber>100024</ItemNumber> <ModelNumber>64707</ModelNumber> </Product> <Product> <Name>30-Inch Belt-Drive Whole-House Attic Fan With Shutter</Name> <Description>The 30" belt drive whole house fan (5700 CFM) with automatic shutter helps cool living spaces up to 1900 square feet. It runs on high &amp; low and a 2 speed wall switch is included. The automatic shutter is white. It needs 1095 square inches of open exhaust vents in attic space, with a rough opening of 34-1/4" x 29". You do have to cut joist when installing fan, with the motor mounted on struts above housing. The fan will be quieter than direct drive models. There is a 10 year limited parts warranty, 5 year limited labor warranty. </Description> <ImageFile>100073.jpg</ImageFile> <ItemNumber>100024</ItemNumber> <ModelNumber>64707</ModelNumber> </Product> </Products> <ProductCatalogImport/>
将内容写入不同行的代码。(根据注释的要求)

require 'csv' require 'nokogiri' xml = File.read('roduct_catalog.xml') doc = Nokogiri::XML(xml) all_the_things = [] doc.xpath('//Products/Product').each do |file| handle = file.xpath("./ItemNumber").first.text title = file.xpath("./Name").first.text description = file.xpath("./Description").first.text # collection = file.xpath("./FLDeptName").first.text #<== commented because as ./FLDeptName node not present image1 = "www.mybaseurl.com/" + file.xpath("./ImageFile").first.text if file.xpath("./ImageFile2").size() > 0 image2 = "www.mybaseurl.com/" + file.xpath("./ImageFile2").first.text else image2 = '' end # all_the_things << [ handle, title, description, collection, image1]#<== commented because as ./FLDeptName node not present all_the_things << [handle, title, description, image1, image2] end CSV.open('product_file_1.csv', 'wb') do |row| # row << [ 'handle', 'title', 'description','collection' 'image1'] #<== commented because as ./FLDeptName node not present row << ['handle', 'title', 'description', 'image1', 'image2'] all_the_things.each do |data| if data[-1] != '' row << data[0...-1] row << [data[0], '', '', '', data[-1]] else row << data end end end

她是输出。enter image description here
© www.soinside.com 2019 - 2024. All rights reserved.