0

This is my simplified myXML:

<?xml version="1.0" encoding="utf-8"?>
<ShipmentRequest>
  <Message>
      <MemberId>A00000001</MemberId>
      <MemberName>Bruce</MemberName>
    <Line>
      <LineNumber>3.1</LineNumber>
      <Item>fruit-004</Item>
      <Description>Peach</Description>
    </Line>
    <Line>
      <LineNumber>4.1</LineNumber>
      <Item>fruit-001</Item>
      <Description>Peach</Description>
    </Line>
  </Message>
</ShipmentRequest>

When I parse it with the Crack gem myHash is:

{
   "MemberId"=>"A00000001", 
   "MemberName"=>"Bruce", 
   "Line"=>[
       {"LineNumber"=>"3.1", "Item"=>"A0001", "Description"=>"Apple"}, 
       {"LineNumber"=>"4.1", "Item"=>"A0002", "Description"=>"Peach"}
    ]
}

The Crack gem creates the hash Line as an array, because there two <Line> nodes in myXML. But if myXML would contain only one <Line> node, the Crack gem would not parse it as an array:

{
    "MemberId"=>"ABC0001", 
    "MemberName"=>"Alan", 
    "Line"=> {"LineNumber"=>"4.1", "Item"=>"fruit-004", "Description"=>"Apple"}
}

I want to see it still as an array no matter if there's only one node:

{
    "MemberId"=>"ABC0001", 
    "MemberName"=>"Alan", 
    "Line"=> [{"LineNumber"=>"4.1", "Item"=>"fruit-004", "Description"=>"Apple"}]
}

2 Answers 2

4

After you convert the XML document to a hash you could do this:

myHash["Line"] = [myHash["Line"]] if myHash["Line"].kind_of?(Hash)

It will ensure that the Line node will be wrapped in Array.

0
1

The problem is, you're relying on code to do what you really should do. Crack has no idea that you want a single node to be an array of a single element, and that behavior makes it a lot more difficult for you when trying to dive into that portion of the data.

Parsing XML isn't hard, and, by parsing it yourself, you'll know what to expect, and will avoid the hassle of dealing with the "sometimes it's an array and sometimes it's not" returned by Crack.

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="utf-8"?>
<ShipmentRequest>
  <Message>
      <MemberId>A00000001</MemberId>
      <MemberName>Bruce</MemberName>
    <Line>
      <LineNumber>3.1</LineNumber>
      <Item>fruit-004</Item>
      <Description>Peach</Description>
    </Line>
    <Line>
      <LineNumber>4.1</LineNumber>
      <Item>fruit-001</Item>
      <Description>Peach</Description>
    </Line>
  </Message>
</ShipmentRequest>
EOT

That sets up the DOM, so it can be navigated:

hash = {}
message = doc.at('Message')
hash[:member_id] = message.at('MemberId').text
hash[:member_name] = message.at('MemberName').text
lines = message.search('Line').map do |line|
  line_number = line.at('LineNumber').text 
  item = line.at('Item').text 
  description = line.at('Description').text

  {
    :line_number => line_number,
    :item        => item,
    :description => description
  }
end
hash[:lines] = lines
  1. message = doc.at('Message') finds the first <Message> node.
  2. message.at('MemberId').text finds the first <MemberID> node inside <Message>.
  3. message.at('MemberName').text is similar to the above step.
  4. message.search('Line') looks for all <Line> nodes inside <Message>.

From those descriptions you can figure out the rest.

After running, hash looks like:

{:member_id=>"A00000001",
:member_name=>"Bruce",
:lines=>
  [{:line_number=>"3.1", :item=>"fruit-004", :description=>"Peach"},
  {:line_number=>"4.1", :item=>"fruit-001", :description=>"Peach"}]}

If I remove one of the <Line> blocks from the XML, and re-run, I get:

{:member_id=>"A00000001",
:member_name=>"Bruce",
:lines=>[{:line_number=>"3.1", :item=>"fruit-004", :description=>"Peach"}]}

Using search to locate the <Line> nodes is the trick. search returns a NodeSet, which is akin to an Array, so by iterating over it using map it'll return an array of hashes of the contents of <Line> tags.

Nokogiri is a great tool for parsing HTML and XML, then allowing us to search, add, change or remove nodes. It supports CSS and XPath accessors, so if you are used to jQuery or how CSS works, or XPath expressions, you'll be off and running quickly. The tutorials for Nokogiri are a good starting place to learn how it works.

2
  • Hi @the Tin Man ! Thanks for your input. I'm really new to these stuff, but I had an impression that the Crack gem is pretty handy when I just convert XML file into hash in bunch, and in particularly, easy to import into the MongoDB. After having them converted, then I can polish it by looking at some nodes that I need to be as array. :)
    – Askar
    Commented Jun 9, 2013 at 7:35
  • 1
    Crack is handy with very simple XML/JSON, but the larger the data set, the harder extracting data can become. Learn to do this stuff directly using Nokogiri and you'll never look at another XML parser again. Commented Jun 9, 2013 at 7:58

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.