Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

For fun and for practice, I'm thinking of maybe implementing a very basic programming language to run on a simple interpreter written in Java.

My question is this:

Is it a 'must' to know regular expressions when designing an interpreter, for parsing the text? Should I learn this topic before starting to implement the interpreter?

Or is it enough to just run through the text, break down into strings, etc, i.e. do 'basic parsing' without regular expressions?

share|improve this question

closed as primarily opinion-based by gnat, amon, GlenH7, MichaelT, Kilian Foth May 6 '14 at 9:18

Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.

    
How basic are we talking? 1+1 or for(i=0; i < 5; i++) puts i? –  delnan Apr 27 '14 at 17:07
    
@delnan Not sure I understand the question, but basically the language is going to look something like this (imagine a new row every semicolon): var x = 2; var y = x * 4; print y; Very simple. Is it a good idea to learn about regex? –  Aviv Cohn Apr 27 '14 at 17:09
6  
You should never parse a non-regular language with regular expressions. Never. You can only parse a regular language with regular expressions. en.wikipedia.org/wiki/Chomsky_hierarchy –  SK-logic Apr 27 '14 at 17:40
1  
@Kos, somewhat more powerful than the regular grammars, yes, but yet, totally unsuitable for parsing anything. –  SK-logic Apr 27 '14 at 20:05
1  
You should learn regular expressions whether or not you plan to write an interpreter. –  user61852 May 5 '14 at 12:50

2 Answers 2

The actual parsing usually wouldn't be done with regular expressions, or with code that is equivalent to some regular expression. You'd use them for lexing, i.e. for identifying numbers, variable names, punctuation, keywords, and so on without parsing them any further.

It's not required to actually use regular expressions in a lexer. But they are a very useful tool for describing and visualizing the syntax. Moreover, as regular expressions are basically very primitive parsers, learning the concepts surrounding them (e.g. repetition and choice) will benefit you when you turn towards parsing, since some parsing concepts are analogous to (though broader than) regex concepts.

There's also the fact that you can use regular expressions for many tasks beyond lexing; many common string processing tasks can be solved easily using them. So, in summary: You may be able to do without, but it's recommended. You don't necessarily need to learn beforehand though, you can try learning them as you implement the lexer.

share|improve this answer
    
Do you mean that regexes save me some parsing, by giving me an easier way to identify variable names etc. instead of doing a lot of parsing? –  Aviv Cohn Apr 27 '14 at 21:43
    
Yes. Regexes save you long sequences of "if the next character is like that, try this next" just to say, for example, that an identifier is a sequence of letters and digits. You can (and should, regardless of whether you use regexes or not) still put this code in a separate lexer to separate the parser from those concerns. But it'll still be the same amount of code. –  delnan Apr 27 '14 at 21:48

I believe you should not use Regular Expressions until you get to the point -- in your lexer or parser, where you can only deal with Type-3 of Chomsky hierarchy.

As @Sk-logic mentioned in the comments, parsing a non-regular language with regular expressions wouldn't work, simply because of the lack of recursive-ness in regular expressions -- in my opinion.

Parsing something like CSS Selectors could be a good fit for regular expressions, however working on anything above the Type-3 would get you in trouble at some point, I'd say.

share|improve this answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.