How do I force parsing an XML node as hash array?

Question

This is my simplified myXML:

<?xml version="1.0" encoding="utf-8"?>
<ShipmentRequest>
  <Message>
      <MemberId>A00000001</MemberId>
      <MemberName>Bruce</MemberName>
    <Line>
      <LineNumber>3.1</LineNumber>
      <Item>fruit-004</Item>
      <Description>Peach</Description>
    </Line>
    <Line>
      <LineNumber>4.1</LineNumber>
      <Item>fruit-001</Item>
      <Description>Peach</Description>
    </Line>
  </Message>
</ShipmentRequest>

When I parse it with the Crack gem myHash is:

{
   "MemberId"=>"A00000001", 
   "MemberName"=>"Bruce", 
   "Line"=>[
       {"LineNumber"=>"3.1", "Item"=>"A0001", "Description"=>"Apple"}, 
       {"LineNumber"=>"4.1", "Item"=>"A0002", "Description"=>"Peach"}
    ]
}

The Crack gem creates the hash Line as an array, because there two <Line> nodes in myXML. But if myXML would contain only one <Line> node, the Crack gem would not parse it as an array:

{
    "MemberId"=>"ABC0001", 
    "MemberName"=>"Alan", 
    "Line"=> {"LineNumber"=>"4.1", "Item"=>"fruit-004", "Description"=>"Apple"}
}

I want to see it still as an array no matter if there's only one node:

{
    "MemberId"=>"ABC0001", 
    "MemberName"=>"Alan", 
    "Line"=> [{"LineNumber"=>"4.1", "Item"=>"fruit-004", "Description"=>"Apple"}]
}

fbonetti · Accepted Answer · 2013-06-09 04:18:38Z

4

After you convert the XML document to a hash you could do this:

myHash["Line"] = [myHash["Line"]] if myHash["Line"].kind_of?(Hash)

It will ensure that the Line node will be wrapped in Array.

edited Jun 9, 2013 at 4:18

answered Jun 9, 2013 at 4:09

fbonetti

6,6724 gold badges35 silver badges32 bronze badges

Add a comment |

the Tin Man · Accepted Answer · 2013-06-09 08:03:10Z

The problem is, you're relying on code to do what you really should do. Crack has no idea that you want a single node to be an array of a single element, and that behavior makes it a lot more difficult for you when trying to dive into that portion of the data.

Parsing XML isn't hard, and, by parsing it yourself, you'll know what to expect, and will avoid the hassle of dealing with the "sometimes it's an array and sometimes it's not" returned by Crack.

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="utf-8"?>
<ShipmentRequest>
  <Message>
      <MemberId>A00000001</MemberId>
      <MemberName>Bruce</MemberName>
    <Line>
      <LineNumber>3.1</LineNumber>
      <Item>fruit-004</Item>
      <Description>Peach</Description>
    </Line>
    <Line>
      <LineNumber>4.1</LineNumber>
      <Item>fruit-001</Item>
      <Description>Peach</Description>
    </Line>
  </Message>
</ShipmentRequest>
EOT

That sets up the DOM, so it can be navigated:

hash = {}
message = doc.at('Message')
hash[:member_id] = message.at('MemberId').text
hash[:member_name] = message.at('MemberName').text
lines = message.search('Line').map do |line|
  line_number = line.at('LineNumber').text 
  item = line.at('Item').text 
  description = line.at('Description').text

  {
    :line_number => line_number,
    :item        => item,
    :description => description
  }
end
hash[:lines] = lines

message = doc.at('Message') finds the first <Message> node.
message.at('MemberId').text finds the first <MemberID> node inside <Message>.
message.at('MemberName').text is similar to the above step.
message.search('Line') looks for all <Line> nodes inside <Message>.

From those descriptions you can figure out the rest.

After running, hash looks like:

{:member_id=>"A00000001",
:member_name=>"Bruce",
:lines=>
  [{:line_number=>"3.1", :item=>"fruit-004", :description=>"Peach"},
  {:line_number=>"4.1", :item=>"fruit-001", :description=>"Peach"}]}

If I remove one of the <Line> blocks from the XML, and re-run, I get:

{:member_id=>"A00000001",
:member_name=>"Bruce",
:lines=>[{:line_number=>"3.1", :item=>"fruit-004", :description=>"Peach"}]}

Using search to locate the <Line> nodes is the trick. search returns a NodeSet, which is akin to an Array, so by iterating over it using map it'll return an array of hashes of the contents of <Line> tags.

Nokogiri is a great tool for parsing HTML and XML, then allowing us to search, add, change or remove nodes. It supports CSS and XPath accessors, so if you are used to jQuery or how CSS works, or XPath expressions, you'll be off and running quickly. The tutorials for Nokogiri are a good starting place to learn how it works.

Hi @the Tin Man ! Thanks for your input. I'm really new to these stuff, but I had an impression that the Crack gem is pretty handy when I just convert XML file into hash in bunch, and in particularly, easy to import into the MongoDB. After having them converted, then I can polish it by looking at some nodes that I need to be as array. :) — Askar, Commented Jun 9, 2013 at 7:35
Crack is handy with very simple XML/JSON, but the larger the data set, the harder extracting data can become. Learn to do this stuff directly using Nokogiri and you'll never look at another XML parser again. — the Tin Man, Commented Jun 9, 2013 at 7:58

Collectives™ on Stack Overflow

How do I force parsing an XML node as hash array?

2 Answers 2

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related