I have the following Ruby code, which parses an XML document sax style (it's a very simplified version):
require 'nokogiri'
class ParsingError < RuntimeError; end
class Parser
def initialize mapper, collector, error_collector
@error_collector = error_collector
@sax_handler = Handler.new(collector, @error_collector, mapper)
end
def process io_stream
parser = Nokogiri::XML::SAX::Parser.new(@sax_handler)
parser.parse_io(io_stream, 'UTF-8')
rescue ParsingError => e
@error_collector << "ParsingError: #{e.message}"
end
end
class Handler < Nokogiri::XML::SAX::Document
def initialize collector, error_collector, mapper
@mapper = mapper
@collector = collector
@error_collector = error_collector
end
def start_element name, attrs=[]
return if name=='nodes'
@current_element = @mapper.new(name)
end
def end_element name, attrs=[]
return if @current_element.nil?
data = @current_element.get_errors
if data[:errors].empty?
@collector << @current_element.result
else
@error_collector << data
end
@current_element = nil
end
def characters string
@current_element.record(string) if @current_element
end
def error message
raise ParsingError, message
end
end
class Mapper
def initialize name
@data = ""
@name = name
end
def record str
@data << str
end
def result
{name: @name, data: @data} # here are more properties
end
def get_errors
arr = []
if @data.index("Error 123")
arr << "Error Code 123"
elsif @data.index("Error 456")
arr << "Error Code 456"
elsif @data.index("Error 789")
arr << "Error Code 789"
# and a lot more validations
end
{name: @name, errors: arr}
end
end
# Example usage
xml = %{<?xml version="1.0" encoding="UTF-8"?>
<nodes>
<node1>some more data</node1>
<node2>data Error 123</node2>
<malformed xml
</nodes>
}
collector = []
error_collector = []
mapper = Mapper
parser = Parser.new(Mapper, collector, error_collector)
parser.process(StringIO.new(xml))
puts "Data:"
puts collector
puts
puts "Errors:"
puts error_collector
My problem is the data collection and error handling.
- I'm passing two arrays around,
collector
anderror_collector
, which doesn't seem right. I want to collect data and errors in one place, but the collector itself should simply implement a<<
method (like an Array). Any advice how to improve it? - Generally, the error handling feels weird. I need it in two places: Logging parsing errors and validations on items. To identify the items I also need the name of a node, except when it's a parsing error. So there are two kinds of errors: Parsing and node validation errors. How could I improve this code and remove the code smells?
- Any other advice how to improve the code above?