improving parser code

Question

In a toy language I wrote (very LISP-like), the most inelegant part is the parsing.

The language is organized as one class per programming construct, each class with its own parsing method. If the parsing method returns nil (meaning it could not parse that block in the current position of the token stream) the next parser is tried, but the method has to restore the token stream to its original state.

This result in such methods:

override class func parse(ts: TokenStream) -> Program? {
    // save position in the stream
    let oldpos = ts.pos
    // build a closure for resetting position in stream and returning nil
    let abort = {() -> Program? in ts.pos = oldpos; return nil}
    // try to read '(', or abort
    guard let t1 = ts.read() where t1.value == "(" else {return abort()}
    // try to read 'while', or abort
    guard let t2 = ts.read() where t2.value == "while" else {return abort()}
    // try to parse a boolean expression, or abort
    guard let cond = BoolExpr.parse(ts) else {return abort()}
    // try to parse a program, or abort
    guard let body = Program.parse(ts) else {return abort()}
    // try to read ')' or abort
    guard let t3 = ts.read() where t3.value == ")" else {return abort()}
    // success: return the AST node
    return While(cond, body)
}

The code above will parse and return a while construct from ( while <expr> <program> ), or will return nil from anything else.

Certainly the use of guard let ... and the little abort closure make it more compact, but still it's far from perfection of beauty.

How would you refactor it?

Note: I tried with parser combinators but I don't like them.

Note 2: without guard and the abort closure, the above code would look like:

override class func parse(ts: TokenStream) -> Program? {
    let oldpos = ts.pos
    if let t1 = ts.read() {
        if t1.value == "(" {
            if let t2 = ts.read() {
                if t2.value == "while" {
                    if let cond = BoolExpr.parse(ts) {
                        if let body = Program.parse(ts) {
                            if let t3 = ts.read() {
                                if t3.value == ")" {
                                    return While(cond, body)
                                }
                            }
                        }
                    }
                }
            }
        }
    }
    ts.pos = oldpos
    return nil
}

As we all want to make our code more efficient or improve it in one way or another, try to write a title that summarizes what your code does, not what you want to get out of a review. — Jamal♦, Jan 14 '16 at 23:53
Please tell us about the language or grammar construct that you want to parse? "This result in such methods" sounds like you want us to review a general idea rather than your actual code. — 200_success, Jan 15 '16 at 0:00

Jerry Coffin · Accepted Answer · 2016-01-15 01:45:39Z

This is a back-tracking parser. Unless you're stuck with a language (e.g., Fortran) that leaves you nearly no other choice, you usually want to avoid that.

The usual way to avoid it is to look ahead a token, and based on that token decide what to do/parse/look for next. At a given point in a program, there are usually only a few general kinds of things allowed. If what you find doesn't fit with what's allowed, you generally want to diagnose the problem then (typically) quit--there's rarely much to gain from attempting to continue parsing after that point, and therefore no need to restore the parser's state.

For the statement above, you'd typically end up with something like this (using a C-like syntax for the moment):

read_lparen(ts);
read_while(ts);
read_expr(ts);
read_program(ts);
read_rparen(ts);

Each of these looks for its specific item, and if it doesn't find it, it prints out an error message. For example, if you gave it something like: (while while) it would print out something like expected expression, found 'while'.

Having done that, there's no need to backtrack to try to find whether the thing following the first while might be a program--even if it is, the code isn't valid, because (while <program>) isn't valid anyway.

Stack Exchange Network

current community

your communities

more stack exchange communities

improving parser code

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged swift or ask your own question.

Hot Network Questions

improving parser code

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged swift or ask your own question.

Related

Hot Network Questions