如何使用Nokogiri对XML文件进行许多更改

问题描述 投票:0回答:2

我正在使用Nokogiri将超过8万行的很大的XML文件转换为CSV格式。

我需要将<ImageFile />节点大规模编辑为类似的内容>>

www.mybaseurl.com + text of <ImageFile /> 

这样,它可能具有完整的图像路径。我查看了他们的所有文档和Stack Overflow,尽管直截了当,但仍然找不到解决我问题的方法。

[我想使用Ruby来检查<AltImageFile1>是否为空,如果不是,我需要在下面创建一个具有相同句柄值但值为[]的新行]

<AltImageFile1> for <ImageFile />

像这样:

enter image description here

这里是我正在使用的XML文件的示例:


<Products>
  <Product>
    <Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
    <Description>This single bit curved grip axe handle is made for 3 to 5 pound axes. A good quality replacement handle made of American hickory with a natural wax finish. Hardwood handles do not conduct electricity and American Hickory is known for its strength, elasticity and ability to absorb shock. These handles provide exceptional value and economy for homeowners and other occasional use applications. Each Link handle comes with the required wedges, rivets, or epoxy needed for proper application of the tool head.</Description>
    <ImageFile>100024.jpg</ImageFile>
    <AltImageFile1>103387-1.jpg</AltImageFile1>
    <ItemNumber>100024</ItemNumber>
    <ModelNumber>64707</ModelNumber>
  </Product>

  <Product>
    <Name>1-1/4-Inch Lavatory Pop Up Assembly</Name>
    <Description>Classic chrome finish with ABS plastic top &amp; body includes push rod, no overflow.</Description>
    <ImageFile>100024.jpg</ImageFile>
    <AltImageFile1>103429-1.jpg</AltImageFile1>
    <ItemNumber>100024</ItemNumber>
    <ModelNumber>64707</ModelNumber>
  </Product>

  <Product>
    <Name>30-Inch Belt-Drive Whole-House Attic Fan With Shutter</Name>
    <Description>The 30" belt drive whole house fan (5700 CFM) with automatic shutter helps cool living spaces up to 1900 square feet. It runs on high &amp; low and a 2 speed wall switch is included. The automatic shutter is white. It needs 1095 square inches of open exhaust vents in attic space, with a rough opening of 34-1/4" x 29". You do have to cut joist when installing fan, with the motor mounted on struts above housing. The fan will be quieter than direct drive models. There is a 10 year limited parts warranty, 5 year limited labor warranty.</Description>

    <ImageFile>100073.jpg</ImageFile>
    <AltImageFile1>
    <ItemNumber>100024</ItemNumber>
    <ModelNumber>64707</ModelNumber>
  </Product>
</Products>

这是我的代码。我该如何改善?

require 'csv'
require 'nokogiri'

xml = File.read('Desktop/roduct_catalog.xml')
doc = Nokogiri::XML(xml)

all_the_things = []

doc.xpath('//Products/Product').each do |file|
  handle = file.xpath("./ItemNumber").first.text 
  title          = file.xpath("./Name").first.text
  description       = file.xpath("./Description").first.text
  collection = file.xpath("./FLDeptName").first.text 
  image1 = file.xpath("./ImageFile").first.text 
  all_the_things << [ handle, title, description, collection, image1]
end


CSV.open('product_file_1.csv', 'wb' ) do |row|
  row << [ 'handle', 'title', 'description', 'collection', 'image1']
  all_the_things.each do |data|
    row << data
  end
end

我正在使用Nokogiri将超过8万行的很大的XML文件转换为CSV格式。我需要将

节点大规模编辑为类似www.mybaseurl.com + ] >>>
ruby xml csv xpath nokogiri
2个回答
0
投票

这里是您可以尝试的代码。我在XML中看不到FLDeptName节点,因此我注释了与该节点相关的行。

require 'csv'
require 'nokogiri'

xml = File.read('roduct_catalog.xml')
doc = Nokogiri::XML(xml)

all_the_things = []

doc.xpath('//Products/Product').each do |file|
  handle = file.xpath("./ItemNumber").first.text
  title = file.xpath("./Name").first.text
  description = file.xpath("./Description").first.text
  # collection = file.xpath("./FLDeptName").first.text #<== commented because as ./FLDeptName node not present
  image1 = "www.mybaseurl.com/" + file.xpath("./ImageFile").first.text
  # all_the_things << [ handle, title, description, collection,  image1]#<== commented because as ./FLDeptName node not present
  all_the_things << [handle, title, description, image1]
end


CSV.open('product_file_1.csv', 'wb') do |row|
  # row << [ 'handle', 'title', 'description','collection' 'image1'] #<== commented because as ./FLDeptName node not present
  row << ['handle', 'title', 'description', 'image1']
  all_the_things.each do |data|
    row << data
  end
end

这里是输出。enter image description here


带有两个图像的XML示例:


0
投票

我将从这样的东西开始:

require 'csv'
require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<Products>
  <Product>
    <Name>36-In. Homeowner Bent Single-Bit Axe Handle</Name>
    <Description>This single bit curved grip axe handle is made for 3 to 5 pound axes. A good quality replacement handle made of American hickory with a natural wax finish. Hardwood handles do not conduct electricity and American Hickory is known for its strength, elasticity and ability to absorb shock. These handles provide exceptional value and economy for homeowners and other occasional use applications. Each Link handle comes with the required wedges, rivets, or epoxy needed for proper application of the tool head.</Description>
    <ImageFile>100024.jpg</ImageFile>
    <AltImageFile1>103387-1.jpg</AltImageFile1>
    <ItemNumber>100024</ItemNumber>
    <ModelNumber>64707</ModelNumber>
  </Product>

  <Product>
    <Name>1-1/4-Inch Lavatory Pop Up Assembly</Name>
    <Description>Classic chrome finish with ABS plastic top &amp; body includes push rod, no overflow.</Description>
    <ImageFile>100024.jpg</ImageFile>
    <AltImageFile1>103429-1.jpg</AltImageFile1>
    <ItemNumber>100024</ItemNumber>
    <ModelNumber>64707</ModelNumber>
  </Product>

  <Product>
    <Name>30-Inch Belt-Drive Whole-House Attic Fan With Shutter</Name>
    <Description>The 30" belt drive whole house fan (5700 CFM) with automatic shutter helps cool living spaces up to 1900 square feet. It runs on high &amp; low and a 2 speed wall switch is included. The automatic shutter is white. It needs 1095 square inches of open exhaust vents in attic space, with a rough opening of 34-1/4" x 29". You do have to cut joist when installing fan, with the motor mounted on struts above housing. The fan will be quieter than direct drive models. There is a 10 year limited parts warranty, 5 year limited labor warranty.</Description>

    <ImageFile>100073.jpg</ImageFile>
    <AltImageFile1>
    <ItemNumber>100024</ItemNumber>
    <ModelNumber>64707</ModelNumber>
  </Product>
</Products>
EOT

这是逻辑:

© www.soinside.com 2019 - 2024. All rights reserved.