将xml转换为本机Ruby数据结构

问题描述 投票:3回答:4

我正在从像这样返回xml的api中获取数据:

<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>

我是反序列化的新手,但我认为合适的是将这个xml解析成一个ruby对象,然后我可以引用像objectFoo.seriess.series.frequency那样返回'Quarterly'。

从我在这里和谷歌的搜索中,似乎没有一个明显的解决方案,这在Ruby(NOT rails),这让我觉得我错过了一些相当明显的东西。有任何想法吗?

编辑我根据Winfield的建议设置了一个测试用例。

class Exopenstruct

  require 'ostruct'

  def initialize()  

  hash = {"seriess"=>{"realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "series"=>{"id"=>"GDPC1", "realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "title"=>"Real Gross Domestic Product, 1 Decimal", "observation_start"=>"1947-01-01", "observation_end"=>"2012-10-01", "frequency"=>"Quarterly", "frequency_short"=>"Q", "units"=>"Billions of Chained 2005 Dollars", "units_short"=>"Bil. of Chn. 2005 $", "seasonal_adjustment"=>"Seasonally Adjusted Annual Rate", "seasonal_adjustment_short"=>"SAAR", "last_updated"=>"2013-01-30 07:46:54-06", "popularity"=>"93", "notes"=>"Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States.\n\nFor more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"}}}

  object_instance = OpenStruct.new( hash )

  end
end

在irb中我加载了rb文件并实例化了该类。但是,当我尝试访问一个属性(例如instance.seriess)时,我收到了:NoMethodError:undefined method`seriess'

如果我遗漏了一些明显的东西,再次道歉。

ruby xml
4个回答
14
投票

您可能最好使用标准XML进行散列解析,例如Rails中包含的:

object_hash = Hash.from_xml(xml_string)
puts object_hash['seriess']

如果您不使用Rails堆栈,则可以使用像Nokogiri这样的库来实现相同的行为。

编辑:如果您正在寻找对象行为,使用OpenStruct是一个很好的方式来包装哈希:

object_instance = OpenStruct.new( Hash.from_xml(xml_string) )
puts object_instance.seriess

注意:对于深度嵌套的数据,您可能还需要以递归方式将嵌入的哈希值转换为OpenStruct实例。 IE:如果上面的属性是值的哈希值,则它将是哈希值而不是OpenStruct。


4
投票

我刚刚开始使用Damien Le Berrigaud's fork of HappyMapper,我真的很高兴。您定义了简单的Ruby类和include HappyMapper。当你调用parse时,它使用Nokogiri在XML中啜饮,你会得到一个完整的真实Ruby对象树。

我用它来解析多兆字节的XML文件,发现它快速可靠。看看README

一个提示:由于XML文件编码字符串有时会出现问题,您可能需要像这样清理XML:

def sanitize(xml)
  xml.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end

在将其传递给#parse方法之前,为了避免Nokogiri的Input is not proper UTF-8, indicate encoding !错误。

更新

我继续将OP的示例转换为HappyMapper:

XML_STRING = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'

class Series; end;              # fwd reference

class Seriess
  include HappyMapper
  tag 'seriess'

  attribute :realtime_start, Date
  attribute :realtime_end, Date
  has_many :seriess, Series, :tag => 'series'
end
class Series
  include HappyMapper
  tag 'series'

  attribute 'id', String
  attribute 'realtime_start', Date
  attribute 'realtime_end', Date
  attribute 'title', String
  attribute 'observation_start', Date
  attribute 'observation_end', Date
  attribute 'frequency', String
  attribute 'frequency_short', String
  attribute 'units', String
  attribute 'units_short', String
  attribute 'seasonal_adjustment', String
  attribute 'seasonal_adjustment_short', String
  attribute 'last_updated', DateTime
  attribute 'popularity', Integer
  attribute 'notes', String
end

def test
  Seriess.parse(XML_STRING, :single => true)
end

这就是你可以用它做的事情:

>> a = test
>> a.class
Seriess
>> a.seriess.first.frequency
=> "Quarterly"
>> a.seriess.first.observation_start
=> #<Date: 1947-01-01 ((2432187j,0s,0n),+0s,2299161j)>
>> a.seriess.first.popularity
=> 93

1
投票

Nokogiri解决了解析问题。如何处理数据,取决于你,在这里我以OpenStruct为例:

require 'nokogiri'
require 'ostruct'
require 'open-uri'

doc = Nokogiri.parse open('http://www.w3schools.com/xml/note.xml')

note = OpenStruct.new

note.to = doc.at('to').text
note.from = doc.at('from').text
note.heading = doc.at('heading').text
note.body = doc.at('body').text

=> #<OpenStruct to="Tove", from="Jani", heading="Reminder", body="ToveJaniReminderDon't forget me this weekend!\r\n">

这只是一个预告片,你的问题幅度可能要大很多倍。只是给你一个开始使用的优势


编辑:在谷歌和stackoverflow上遇到困难我遇到了我的答案和使用rails Hash#from_xml的@Winfield之间的可能混合:

> require 'active_support/core_ext/hash/conversions'
> xml = Nokogiri::XML.parse(open('http://www.w3schools.com/xml/note.xml'))
> Hash.from_xml(xml.to_s)
=> {"note"=>{"to"=>"Tove", "from"=>"Jani", "heading"=>"Reminder", "body"=>"Don't forget me this weekend!"}}

然后你可以使用这个哈希来,例如,初始化一个新的ActiveRecord :: Base模型实例或你决定用它做的任何其他事情。

http://nokogiri.org/ http://ruby-doc.org/stdlib-1.9.3/libdoc/ostruct/rdoc/OpenStruct.html https://stackoverflow.com/a/7488299/1740079


0
投票

如果你想将xml转换为Hash,我发现nori gem是最简单的。

例:

require 'nori'

xml = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'

hash = Nori.new.parse(xml)    
hash['seriess']
hash['seriess']['series']
puts hash['seriess']['series']['@frequency']

注意'@'用于频率,因为它是'series'的属性而不是元素。

© www.soinside.com 2019 - 2024. All rights reserved.