Suppose I have a lot of large XMLs I want to parse data from and each of these can range in size from 10MB to well over 300MB.
Now, I only need a small sub-set of data from a few specific keys.
For example, suppose this XML I was looking to parse had the following structure:
<Doc>
<Big_Element1>
... LOTS of sub-elements ...
</Big_Element1>
.....
<Small_Element1>
<Sub_Element1_1 />
...
<Sub_Element1_N />
</Small_Element1>
.....
<Small_Element2>
<Sub_Element2_1 />
...
<Sub_Element2_N />
</Small_Element2>
.....
<Big_ElementN>
.......
</Big_ElementN>
</Doc>
And all I really need is the data from the Small_Element
s and the Big_Element
s are definitely very large (with many small sub-elements within them) and, so, I'd like to not even enter them if I don't have to.
So, initially, I asked this question on StackOverFlow and think I understood the answer correctly, but I want to make sure I wrote this code in the best / most efficient way possible.
My solution came out to look as follows:
Dim doc As XmlDocument
Dim xNd As XmlNode
Using reader As XmlReader = XmlReader.Create(uri)
reader.MoveToContent()
reader.Read()
Do While True
If reader.NodeType = XmlNodeType.Element Then
Select Case UCase(reader.Name)
Case "SMALL_ELEMENT1"
doc = New XmlDocument
xNd = doc.ReadNode(reader)
GetSmallElement1Data(xNd)
Case "SMALL_ELEMENT2"
doc = New XmlDocument
xNd = doc.ReadNode(reader)
GetSmallElement2Data(xNd)
Case Else
reader.Skip()
End Select
ElseIf reader.NodeType = XmlNodeType.EndElement Then
Exit Do
Else
' We should never get here:
Throw New NotSupportedException("XML Structure is not as was expected")
End If
Loop
End Using
And GetSmallElement1Data(xNd)
& GetSmallElement2Data(xNd)
are easy enough for me to deal with since they're small and so I use XPath within them to get the data I need.
But my challenge is that I'm in no way sure I coded that correctly.
Also, I know this sample code was written in VB.net, but I'm equally comfortable with C# / VB.NET solutions.