Join the Stack Overflow Community
Stack Overflow is a community of 6.4 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

The question is:

Implement XML/HTML Parser: essentially the same as the input is a tokenizer object, you can call its getNextToken() function to get the next token, token structure including name and tag, name is a specific text, tag have open, close, text three kinds, let all the token to build a tree for example:

<html>
    <user>
        <id>aa</id>
        <meta>bb</meta>
    </user>
</html>

Tokens:

("html","open") ("user","open") ("id","open") ("aa","text") ("id","close") 
("meta","open") ("bb","text") ("meta","close") ("user","close") ("html","close")

Implemented the following but am I understanding and answering the question correctly? And how can I implement it in Java?

class Node(object):
    def __init__(self):
        self.text = None
        self.children = {}

class Solution(object):
    def xmlParser(self, tokens):
        root = Node()
        s = [root]
        for t in tokens:
            print t
            if t[1]=='open':
                tmp = Node()
                s[-1].children[t[0]] = tmp
                s.append(tmp)
            elif t[1]=='close':
                s.pop()
            elif t[1]=='text':
                s[-1].text = t[0]
        return root

Thank you in advance and will accept/upvote the answer

share|improve this question
2  
Stack Overflow is not a code writing service. You are expected to try to write the code yourself. After doing more research if you have a problem you can post what you've tried with a clear explanation of what isn't working and providing a Minimal, Complete, and Verifiable example. I suggest reading how to ask a good question. – John Conde 3 hours ago
    
@JohnConde Don't think he is asking to write the whole code. It's already written. The question seem valid to me. – Ly Maneug 3 hours ago
    
Asking how to write it in Java is asking for the code. Additionally, can't they run this script and verify the output themselves? – John Conde 3 hours ago
    
@JohnConde At least, could you address if am I answering properly to the question given with my implementation? – Jo Ko 49 mins ago

The closest java API to what you've presented is javax.xml.stream. There are many others.

Using the stream API parsing would look like this:

public static void main(String[] args) throws Exception {
    XMLStreamReader reader = XMLInputFactory.newInstance().createXMLStreamReader(new FileInputStream("test.xml"));

    int event;
    while ((event = reader.next()) != XMLStreamConstants.END_DOCUMENT) {
        switch (event) {
            case XMLStreamConstants.START_ELEMENT:
                System.out.println("Start " + reader.getLocalName());
                break;

            case XMLStreamConstants.END_ELEMENT:
                System.out.println("End   " + reader.getLocalName());
                break;

            case XMLStreamConstants.CHARACTERS:
                System.out.println("Text  '" + reader.getText() + "'");
                break;
        }
    }
}

As an alternative, you could use the DOM api to parse the document to an in memory tree, or the SAX api to push events to a consumer, or use one of the many object serialization apis (JAXB, for example) to deserialize your document directly to an object.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.