我正在使用Nokogiri将相当大的XML文件转换为CSV格式。我的客户有8万多行。我有2个问题
第一次询问。我需要将Node大规模编辑为类似这样的内容(www.mybaseurl.com +<ImageFile />
的文本)。这样,它可能具有完整的图像路径。我直截了当地看了他们的所有文件。我似乎仍然找不到解决我问题的方法。 StackOverFlow也是如此
第二次询问。我想让ruby检查该列是否为空,是否为空;我需要代码在下面用相同的[[handle value创建一个新行,但是<AltImageFile1> for
<ImageFile />
这里是我正在使用的XML文件的示例:
<Products>
<Product>
<Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
<Description>This single bit curved grip axe handle is made for 3 to 5 pound axes. A good quality replacement handle made of American hickory with a natural wax finish. Hardwood handles do not conduct electricity and American Hickory is known for its strength, elasticity and ability to absorb shock. These handles provide exceptional value and economy for homeowners and other occasional use applications. Each Link handle comes with the required wedges, rivets, or epoxy needed for proper application of the tool head.</Description>
<ImageFile>100024.jpg</ImageFile>
<AltImageFile1>103387-1.jpg</AltImageFile1>
<ItemNumber>100024</ItemNumber>
<ModelNumber>64707</ModelNumber>
</Product>
<Product>
<Name>1-1/4-Inch Lavatory Pop Up Assembly</Name>
<Description>Classic chrome finish with ABS plastic top & body includes push rod, no overflow.</Description>
<ImageFile>100024.jpg</ImageFile>
<AltImageFile1>103429-1.jpg</AltImageFile1>
<ItemNumber>100024</ItemNumber>
<ModelNumber>64707</ModelNumber>
</Product>
<Product>
<Name>30-Inch Belt-Drive Whole-House Attic Fan With Shutter</Name>
<Description>The 30" belt drive whole house fan (5700 CFM) with automatic shutter helps cool living spaces up to 1900 square feet. It runs on high & low and a 2 speed wall switch is included. The automatic shutter is white. It needs 1095 square inches of open exhaust vents in attic space, with a rough opening of 34-1/4" x 29". You do have to cut joist when installing fan, with the motor mounted on struts above housing. The fan will be quieter than direct drive models. There is a 10 year limited parts warranty, 5 year limited labor warranty.</Description>
<ImageFile>100073.jpg</ImageFile>
<AltImageFile1>
<ItemNumber>100024</ItemNumber>
<ModelNumber>64707</ModelNumber>
</Product>
</Products>
这是我的红宝石代码;我该如何改善。我将不胜感激任何解释。
require 'csv' require 'nokogiri' xml = File.read('Desktop/roduct_catalog.xml') doc = Nokogiri::XML(xml) all_the_things = [] doc.xpath('//Products/Product').each do |file| handle = file.xpath("./ItemNumber").first.text title = file.xpath("./Name").first.text description = file.xpath("./Description").first.text collection = file.xpath("./FLDeptName").first.text image1 = file.xpath("./ImageFile").first.text all_the_things << [ handle, title, description, collection, image1] end CSV.open('product_file_1.csv', 'wb' ) do |row| row << [ 'handle', 'title', 'description', 'collection', 'image1'] all_the_things.each do |data| row << data end end
FLDeptName
节点,因此注释了与该节点相关的行。require 'csv'
require 'nokogiri'
xml = File.read('roduct_catalog.xml')
doc = Nokogiri::XML(xml)
all_the_things = []
doc.xpath('//Products/Product').each do |file|
handle = file.xpath("./ItemNumber").first.text
title = file.xpath("./Name").first.text
description = file.xpath("./Description").first.text
# collection = file.xpath("./FLDeptName").first.text #<== commented because as ./FLDeptName node not present
image1 = "www.mybaseurl.com/" + file.xpath("./ImageFile").first.text
# all_the_things << [ handle, title, description, collection, image1]#<== commented because as ./FLDeptName node not present
all_the_things << [handle, title, description, image1]
end
CSV.open('product_file_1.csv', 'wb') do |row|
# row << [ 'handle', 'title', 'description','collection' 'image1'] #<== commented because as ./FLDeptName node not present
row << ['handle', 'title', 'description', 'image1']
all_the_things.each do |data|
row << data
end
end
这里是输出。Edit 1:根据评论。
带有2个图像的XML示例
<?xml version="1.0"?> <ProductCatalogImport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <Products> <Product> <Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name> <Description>This single bit curved grip axe handle is made for 3 to 5 pound axes. A good quality replacement handle made of American hickory with a natural wax finish. Hardwood handles do not conduct electricity and American Hickory is known for its strength, elasticity and ability to absorb shock. These handles provide exceptional value and economy for homeowners and other occasional use applications. Each Link handle comes with the required wedges, rivets, or epoxy needed for proper application of the tool head. </Description> <ImageFile>100024.jpg</ImageFile> <ImageFile2>100024-2.jpg</ImageFile2> <ItemNumber>100024</ItemNumber> <ModelNumber>64707</ModelNumber> </Product> <Product> <Name>1-1/4-Inch Lavatory Pop Up Assembly</Name> <Description>Classic chrome finish with ABS plastic top & body includes push rod, no overflow.</Description> <ImageFile>100024.jpg</ImageFile> <ItemNumber>100024</ItemNumber> <ModelNumber>64707</ModelNumber> </Product> <Product> <Name>30-Inch Belt-Drive Whole-House Attic Fan With Shutter</Name> <Description>The 30" belt drive whole house fan (5700 CFM) with automatic shutter helps cool living spaces up to 1900 square feet. It runs on high & low and a 2 speed wall switch is included. The automatic shutter is white. It needs 1095 square inches of open exhaust vents in attic space, with a rough opening of 34-1/4" x 29". You do have to cut joist when installing fan, with the motor mounted on struts above housing. The fan will be quieter than direct drive models. There is a 10 year limited parts warranty, 5 year limited labor warranty. </Description> <ImageFile>100073.jpg</ImageFile> <ItemNumber>100024</ItemNumber> <ModelNumber>64707</ModelNumber> </Product> </Products> <ProductCatalogImport/>
将内容写入不同行的代码。(根据注释的要求)
require 'csv' require 'nokogiri' xml = File.read('roduct_catalog.xml') doc = Nokogiri::XML(xml) all_the_things = [] doc.xpath('//Products/Product').each do |file| handle = file.xpath("./ItemNumber").first.text title = file.xpath("./Name").first.text description = file.xpath("./Description").first.text # collection = file.xpath("./FLDeptName").first.text #<== commented because as ./FLDeptName node not present image1 = "www.mybaseurl.com/" + file.xpath("./ImageFile").first.text if file.xpath("./ImageFile2").size() > 0 image2 = "www.mybaseurl.com/" + file.xpath("./ImageFile2").first.text else image2 = '' end # all_the_things << [ handle, title, description, collection, image1]#<== commented because as ./FLDeptName node not present all_the_things << [handle, title, description, image1, image2] end CSV.open('product_file_1.csv', 'wb') do |row| # row << [ 'handle', 'title', 'description','collection' 'image1'] #<== commented because as ./FLDeptName node not present row << ['handle', 'title', 'description', 'image1', 'image2'] all_the_things.each do |data| if data[-1] != '' row << data[0...-1] row << [data[0], '', '', '', data[-1]] else row << data end end end
她是输出。